[00:01:37] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674309 (10EddieGP) Per comments given on the patch, it will need a +1 from #DBA to proceed. I'm moving this on your w... [00:06:22] PROBLEM - Host ps1-a6-eqiad is DOWN: PING CRITICAL - Packet loss = 28%, RTA = 2291.40 ms [00:09:13] PROBLEM - Host ps1-b2-eqiad is DOWN: PING CRITICAL - Packet loss = 37%, RTA = 2215.92 ms [00:09:32] PROBLEM - Host ps1-b3-eqiad is DOWN: PING CRITICAL - Packet loss = 16%, RTA = 2698.29 ms [00:10:11] RECOVERY - Host ps1-a6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.04 ms [00:10:42] PROBLEM - Host ps1-b5-eqiad is DOWN: PING CRITICAL - Packet loss = 28%, RTA = 2598.28 ms [00:11:01] PROBLEM - Host ps1-b6-eqiad is DOWN: PING CRITICAL - Packet loss = 50%, RTA = 2984.95 ms [00:12:11] PROBLEM - ps1-b1-eqiad-infeed-load-tower-A-phase-X on ps1-b1-eqiad is CRITICAL: SNMP CRITICAL - ps1-b1-eqiad-infeed-load-tower-A-phase-X *-1* [00:12:11] PROBLEM - ps1-b1-eqiad-infeed-load-tower-A-phase-Y on ps1-b1-eqiad is CRITICAL: SNMP CRITICAL - ps1-b1-eqiad-infeed-load-tower-A-phase-Y *-1* [00:12:31] PROBLEM - Host ps1-b8-eqiad is DOWN: PING CRITICAL - Packet loss = 44%, RTA = 2409.11 ms [00:12:51] RECOVERY - Host ps1-b2-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.10 ms [00:13:11] RECOVERY - ps1-b1-eqiad-infeed-load-tower-A-phase-X on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-A-phase-X 688 [00:13:11] RECOVERY - ps1-b1-eqiad-infeed-load-tower-A-phase-Y on ps1-b1-eqiad is OK: SNMP OK - ps1-b1-eqiad-infeed-load-tower-A-phase-Y 800 [00:13:20] (03CR) 10Dzahn: "this should work as long as RT never shares a server with another role using Apache again, which i hope won't happen, heh" [puppet] - 10https://gerrit.wikimedia.org/r/382343 (owner: 10Dzahn) [00:13:31] RECOVERY - Host ps1-b3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.86 ms [00:14:31] RECOVERY - Host ps1-b5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.65 ms [00:14:43] (03CR) 10Dzahn: "@Alex should i do it for this special cases where it won't share the node with other roles? :p" [puppet] - 10https://gerrit.wikimedia.org/r/382343 (owner: 10Dzahn) [00:14:51] RECOVERY - Host ps1-b6-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.22 ms [00:16:21] (03PS2) 10Dzahn: contint: apt-get update before installing packages [puppet] - 10https://gerrit.wikimedia.org/r/382429 (owner: 10Hashar) [00:16:31] RECOVERY - Host ps1-b8-eqiad is UP: PING OK - Packet loss = 0%, RTA = 3.23 ms [00:16:55] (03CR) 10Dzahn: [C: 032] ""not applied on contint1001/contint2001. It is for the Nodepool images"" [puppet] - 10https://gerrit.wikimedia.org/r/382429 (owner: 10Hashar) [00:17:11] PROBLEM - Host ps1-c3-eqiad is DOWN: PING CRITICAL - Packet loss = 44%, RTA = 2579.29 ms [00:17:42] PROBLEM - Host ps1-c4-eqiad is DOWN: PING CRITICAL - Packet loss = 28%, RTA = 2219.43 ms [00:18:22] PROBLEM - Host ps1-c5-eqiad is DOWN: PING CRITICAL - Packet loss = 16%, RTA = 2178.79 ms [00:18:49] (03CR) 10Dzahn: "Paladox, any news?" [puppet] - 10https://gerrit.wikimedia.org/r/368196 (owner: 10Paladox) [00:21:21] RECOVERY - Host ps1-c4-eqiad is UP: PING OK - Packet loss = 0%, RTA = 2.87 ms [00:21:31] RECOVERY - Host ps1-c3-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.75 ms [00:22:01] RECOVERY - Host ps1-c5-eqiad is UP: PING OK - Packet loss = 0%, RTA = 1.30 ms [00:23:24] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3674335 (10Dzahn) Looks like this is the relevant list https://wikitech.wikimedia.org/wiki/Prometheus#Ganglia_plugins to see which plugins have been replaced with what. Can any update... [00:25:39] (03CR) 10Dzahn: "grafana dashboards replacing this https://grafana.wikimedia.org/dashboard/db/elasticsearch?orgId=1 et al" [puppet] - 10https://gerrit.wikimedia.org/r/382927 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:26:55] (03CR) 10Dzahn: "Grafana dashboards replacing this: https://grafana.wikimedia.org/dashboard/db/dns-recursors?orgId=1" [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:34:04] (03CR) 10Dzahn: [C: 031] "so the direct comparison seems to be" [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:36:01] RECOVERY - Check whether ferm is active by checking the default input chain on ftp-internal is OK: OK ferm input default policy is set [00:51:03] RECOVERY - Host labweb1002 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [00:51:11] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3674359 (10ayounsi) [00:53:54] PROBLEM - nutcracker port on labweb1002 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [00:53:54] PROBLEM - Check systemd state on labweb1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:53:54] PROBLEM - nutcracker process on labweb1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker [00:54:43] PROBLEM - puppet last run on labweb1002 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 26 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[libssl1.0.0-dbg],Package[libstdc++6-4.8-dbg],Package[libjson-c2-dbg],Package[libboost1.55-dbg] [01:12:01] PROBLEM - HHVM rendering on labweb1002 is CRITICAL: connect to address 208.80.155.109 and port 80: No route to host [01:13:20] PROBLEM - Host labweb1002 is DOWN: CRITICAL - Host Unreachable (208.80.155.109) [01:16:13] (03PS3) 10Dzahn: phabricator: move apache includes to profile [puppet] - 10https://gerrit.wikimedia.org/r/382342 [01:16:47] that's not down to me (labweb1002) [01:18:42] (03PS4) 10Dzahn: phabricator: move apache includes to profile [puppet] - 10https://gerrit.wikimedia.org/r/382342 [01:19:20] (03CR) 10Dzahn: [C: 032] "can't replace the includes yet, but at least they can already be in profile where they should be later" [puppet] - 10https://gerrit.wikimedia.org/r/382342 (owner: 10Dzahn) [01:21:12] andrewbogott: oh, labweb1002 you are moving that one from public to private IP, right.. that explains a lot [01:21:31] let's somehow make Icinga forget the old one [01:21:38] I'm not doing anything at the moment, but I think it's already moved? [01:21:49] Yeah, might be some cached things to purge though, sorry if there were false alarms [01:21:53] for some reason it was new to Icinga [01:21:54] maybe the downtimes just now expired? [01:21:58] oh, right [01:22:36] i dont know how to properly remove it [01:22:40] in this situation [01:22:51] mutante: ok — I think I've done it but don't remember exactly... [01:22:58] I'll downtime for the moment and then will figure out about purging things. [01:23:12] andrewbogott: ok, thanks, cool [01:23:26] it was in mysql right [01:24:16] (03CR) 10Dzahn: "no-op on phab1001/2001" [puppet] - 10https://gerrit.wikimedia.org/r/382342 (owner: 10Dzahn) [01:24:28] sorry for the racket! [01:24:54] no worries [01:37:57] (03CR) 10Krinkle: [C: 04-2] "The Minerva config provides the logo as an image url, from /static, without a version hash. Normally unversioned urls are bad for cache, b" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [02:26:07] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.2) (duration: 08m 52s) [02:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:32:46] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Oct 11 02:32:46 UTC 2017 (duration 6m 39s) [02:32:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:03:48] RECOVERY - Host labweb1001 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [03:03:48] RECOVERY - Host labweb1002 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [03:14:26] umm, can someone explain (or fix) this test failure? https://gerrit.wikimedia.org/r/#/c/383498/ [03:14:28] PROBLEM - Disk space on furud is CRITICAL: DISK CRITICAL - free space: /mnt/2a 1491901 MB (3% inode=96%) [03:16:09] looks like a bug in the test environment [03:16:26] i think you can trigger a rerun by replying "recheck" [03:18:43] ori failed again. :( https://gerrit.wikimedia.org/r/#/c/383498/ [03:24:33] I think a root or a release engineer has to go in and delete that file manually [03:24:40] file a phab task [03:24:44] sorry you have to deal with that [03:28:29] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 687.35 seconds [03:28:48] PROBLEM - MariaDB Slave IO: s3 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect [03:28:49] PROBLEM - MariaDB Slave SQL: s3 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect [03:33:49] 10Operations, 10Operations-Software-Development, 10Jenkins: Test Failure Unrelated to Patch - https://phabricator.wikimedia.org/T177905#3674474 (10dbarratt) [03:34:04] ori ^ [04:32:49] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 245.13 seconds [05:00:22] !log kartik@tin Started deploy [cxserver/deploy@a79b38c]: Update cxserver to 273b515 [05:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:19] !log kartik@tin Finished deploy [cxserver/deploy@a79b38c]: Update cxserver to 273b515 (duration: 02m 57s) [05:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:42] !log Dump buffer pool on s4 primary master db1068 - T168661 [05:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:48] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:07:26] dbstore2001:s3 maybe down? [05:08:01] "Connection refused", which is strange [05:08:32] do I try to stop it? [05:08:44] anything from the logs? [05:09:23] oh, it crashed [05:09:32] :( [05:09:32] it is rebooting by itself [05:10:27] assertion failed [05:10:58] we may need to reload it [05:11:15] only s3 crashed? [05:11:29] yes [05:11:39] it is on dbstore2002, too [05:11:48] but if it is a physical error [05:12:02] we may need to reload it again fully [05:12:26] the good thing it is that reloading a shard is just a netcat :) [05:12:39] it complaing about uzwiki/querycachetwo [05:12:51] *complained [05:13:39] !log Disable alerts on s4 for 2 hours - T168661 [05:13:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:46] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:13:59] we could prevently drop it and reimport it on dbstore2002 [05:14:13] I will create a ticket about it and forget it [05:14:31] yeah, good idea [05:15:55] 10Operations, 10ops-codfw, 10DBA: db2038 two disks with predictive failure - https://phabricator.wikimedia.org/T177720#3674533 (10Marostegui) 05Open>03Resolved And all good now! Thanks a lot @Papaul ``` root@db2038:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 00... [05:16:20] (03PS7) 10Marostegui: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) [05:16:31] (03PS5) 10Marostegui: db1068: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/382380 (https://phabricator.wikimedia.org/T168661) [05:19:00] buffer pools diry is 6%, we can lower it to 1% [05:19:08] on db1068 [05:22:46] it says backup finished [05:23:01] on dbstore2001 [05:23:32] (s3) [05:24:13] lowered to 1% \o/ [05:25:48] RECOVERY - MariaDB Slave IO: s3 on dbstore2001 is OK: OK slave_io_state Slave_IO_Running: Yes [05:25:58] RECOVERY - MariaDB Slave SQL: s3 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: Yes [05:26:07] forgetting it about if for now [05:35:03] !log Disable puppet on db1068 (s4 primary master) [05:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:35:14] !log Disable puppet on db1068 (s4 primary master) - T168661 [05:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:35:21] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:35:49] (03CR) 10Marostegui: [C: 032] db1068: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/382380 (https://phabricator.wikimedia.org/T168661) (owner: 10Marostegui) [05:41:47] it is possible mediawiki errors become higher during read only [05:46:08] (03CR) 10KartikMistry: [C: 031] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [05:48:54] !log Package update on db1068 s4 primary master - T168661 [05:49:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:49:01] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [05:51:19] hi marostegui . can you please review https://gerrit.wikimedia.org/r/#/c/364428/ ? [05:51:26] I want to SWAT it later today. [05:51:41] <_joe_> aharoni: we're in the middle of a major db maintenance [05:51:53] aharoni: not now, sorry, we are doing a big maintenance [05:51:54] <_joe_> I don't think manuel can talk now :P [05:52:42] jynus: All steps before merging the mediawiki-config and set read-only on the master are now done [05:53:06] At 7:59 I will +2 https://gerrit.wikimedia.org/r/#/c/382379/1 so we can deploy [05:54:00] np, not ultra-urgent :) [05:57:13] maybe ahead [05:57:37] sure [05:57:38] it will take some time for jenkins to run [05:57:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [05:57:48] :) [05:57:53] and until it is deployed, no user change [05:58:16] normally this early it gets merged quite fast [05:59:14] (03Merged) 10jenkins-bot: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [05:59:17] \o/ [05:59:23] (03CR) 10jenkins-bot: db-eqiad.php: Set commonswiki on read only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382379 (https://phabricator.wikimedia.org/T176883) (owner: 10Marostegui) [05:59:54] i will deploy in 10 seconds [05:59:55] fetch rebase on tin [06:00:01] it is all done [06:00:05] cool [06:00:06] deploying [06:00:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Set s4 in read-only for maintenance - T168661 (duration: 00m 47s) [06:00:58] !log Set read-only on db1068 s4 primary master - T168661 [06:01:00] I think I lost connection for a second [06:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:03] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [06:01:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:12] All slaves are up to date [06:01:13] <_joe_> I can confirm commons is read-only [06:01:22] <_joe_> tried an edit, got bounced [06:01:24] me too [06:01:31] !log Stop MySQL on db1068 (s4 primary master) for upgrade - T168661 [06:02:05] _joe_, sadly the jobrunners will not be as gentle :-) [06:02:52] <_joe_> jynus: uhm do you want me to stop them? [06:02:57] jynus: I see puppet stopped on db1068 messed a bit with the new package [06:02:58] <_joe_> it takes 1 minute [06:03:06] mariadb -> /opt/wmf-mariadb10/service [06:03:12] I will fix that [06:03:19] checking kibana [06:03:46] back [06:03:50] <_joe_> let me know if you need anything from me guys [06:03:59] <_joe_> anything not db-related, I can take care [06:04:09] _joe_, user atention [06:04:19] and general monitoring awareness [06:04:36] user attention as in, if someone reports issues [06:04:42] <_joe_> yes [06:04:46] mysql being stopped now [06:04:50] <_joe_> I'm on #-tech as well [06:04:58] exceptions coming in, but all seem clean [06:05:09] (edit trials) [06:05:22] <_joe_> jynus: if you need it, I can stop the jobrunners [06:05:36] starting mysql on db1068 [06:05:46] _joe_, no [06:06:00] it should stop working now, almost [06:06:07] *start [06:06:11] mysql is up [06:06:20] read only still, I assume? [06:06:24] yep [06:06:36] (03PS1) 10Marostegui: Revert "db-eqiad.php: Set commonswiki on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383503 [06:06:37] <_joe_> well mediawiki is read-only as well :P [06:06:55] heartbeat running correctly [06:06:58] all slaves connected [06:07:00] yeah, but job runners do not obey mw config [06:07:04] :-) [06:07:12] not quickly, at least [06:07:14] !log Deploy alter table on s4 primary master - T168661 [06:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:20] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [06:07:25] <_joe_> jynus: well, they do meaning they can't edit if the db is read-only [06:07:28] alters deployed [06:07:30] checking the tables [06:07:35] _joe_, sure [06:07:45] exceptions gone [06:07:47] <_joe_> in mediawiki terms, I mean [06:07:57] _joe_, it is ok for them to retry [06:08:13] less than 2000 soft "errors" [06:08:23] ~0 now [06:08:53] all good [06:08:54] <_joe_> are you going to revert the read-only only after the end of the alter table? [06:09:15] yeah [06:09:21] all done now [06:09:23] so reverting [06:09:31] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Set commonswiki on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383503 (owner: 10Marostegui) [06:09:51] i sill set mysql readonly off once the mediawiki-change is deployed [06:10:25] bad network again :-( [06:11:06] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Set commonswiki on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383503 (owner: 10Marostegui) [06:11:18] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Set commonswiki on read only" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383503 (owner: 10Marostegui) [06:11:59] read only is 1 on the master still [06:12:05] jynus: can you check pt-heartbeat on db1068? [06:12:07] on mysql [06:12:12] i am seeing [06:12:16] SELECT max(ts) FROM heartbeat.heartbeat WHERE shard='s4' and datacenter='eqiad'; [06:12:21] 2017-10-11T06:04:38.000480 [06:12:39] I know why [06:12:42] let's run puppet [06:12:43] let me fix it [06:12:49] ? [06:13:17] i will explain in a bit - fixed now [06:13:20] ready to deploy [06:13:38] it is up to date now [06:13:54] maybe it didn't get killed [06:14:07] so puppet didn't override it [06:14:11] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Set s4 to writable mode after maintenance - T168661 (duration: 00m 47s) [06:14:12] !log Set db1068 s4 primary master to read-only OFF - T168661 [06:14:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:18] T168661: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661 [06:14:20] ok - all done, read only is OFF [06:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:32] let's check recentchanges [06:14:52] going to upload a file [06:14:55] I can edit [06:15:04] (03CR) 10Hashar: [V: 032 C: 032] "Surely using base64() was done to prevent json to mangle a password somehow :-]" (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/383386 (owner: 10Dduvall) [06:15:37] https://commons.wikimedia.org/wiki/Special:RecentChanges?hidebots=1&hidecategorization=1&hideWikibase=1&limit=50&days=7&urlversion=2 [06:15:42] it is coming back [06:15:57] I can upload things yeah [06:16:27] uploads work, yes [06:16:32] https://commons.wikimedia.org/wiki/Special:NewFiles [06:16:52] I can see 8 files after 8:14 [06:17:04] <_joe_> that dreaded page :P [06:17:17] performance is a bit bad [06:17:24] on the master, but that is to be expected [06:17:31] so it is ok the window was larger [06:17:32] yeah [06:17:45] we could have warmed the tables but i think it is ok [06:17:52] it is getting better [06:18:38] `img_media_type` enum('UNKNOWN','BITMAP','DRAWING','AUDIO','VIDEO','MULTIMEDIA','OFFICE','TEXT','EXECUTABLE','ARCHIVE','3D') [06:18:47] good job, manuel [06:19:01] sorry for my connection issues exactly at 6 [06:19:27] looks like the problems in Madrid yesterday arrived there with a bit of delay :p [06:19:35] no, it is my network [06:21:26] anything else left? [06:21:32] the upgrade [06:21:37] mysql_upgrade I mean [06:21:40] going to do it now [06:21:45] sure :-) [06:21:49] done [06:21:50] XD [06:27:13] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674603 (10jcrespo) That requires an order-by, a limit on a loop, and a waitfor replica on every loop step. [06:34:08] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [06:34:29] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [06:34:32] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674608 (10TTO) >>! In T176754#3674603, @jcrespo wrote: > That requires an order-by, a limit on a loop, and a waitfor... [06:36:38] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [06:37:09] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [06:37:25] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674609 (10Marostegui) >>! In T176754#3674608, @TTO wrote: >>>! In T176754#3674603, @jcrespo wrote: >> That requires a... [06:41:22] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674610 (10jcrespo) > will always be very small And recentchanges from wikidata will be a small percentage... https:/... [06:52:43] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383506 (https://phabricator.wikimedia.org/T174509) [06:54:08] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674614 (10TTO) >>! In T176754#3674610, @jcrespo wrote: >> will always be very small > > And recentchanges from wikid... [06:55:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383506 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:57:08] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383506 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:57:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383506 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:57:27] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 3 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3674617 (10jcrespo) Look, I ask for a loop and probably a SELECT... FOR UPDATE, that should be no more than a 3 line c... [06:57:57] !log Optimize templatelinks and pagelinks on db1082 - T174509 [06:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:04] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:58:14] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1082 - T174509 (duration: 00m 47s) [06:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:53] (03PS1) 10Marostegui: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383508 (https://phabricator.wikimedia.org/T174509) [07:00:59] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383508 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:01:49] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383508 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:02:02] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1086 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383508 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:02:26] !log Optimize templatelinks and pagelinks on db1086 - T174509 [07:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:00] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1086 - T174509 (duration: 00m 47s) [07:03:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:06] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:06:17] (03PS1) 10Marostegui: db-eqiad.php: Promote db1072 to vslow in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383510 [07:08:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Promote db1072 to vslow in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383510 (owner: 10Marostegui) [07:09:24] (03Merged) 10jenkins-bot: db-eqiad.php: Promote db1072 to vslow in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383510 (owner: 10Marostegui) [07:09:37] (03CR) 10jenkins-bot: db-eqiad.php: Promote db1072 to vslow in s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383510 (owner: 10Marostegui) [07:10:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Pool db1072 into the vslow and dump service for s3 - T172679 (duration: 00m 46s) [07:10:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:53] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [07:46:08] !log restart commonswiki recentchanges purging T177772 [07:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:16] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772 [07:48:14] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10WorkType-Maintenance: Jenkins master / client ssh connection fails due to missing ssh algorithm - https://phabricator.wikimedia.org/T100509#3674684 (10hashar) [07:49:26] (03CR) 10Hashar: "That was T100518" [puppet] - 10https://gerrit.wikimedia.org/r/383120 (https://phabricator.wikimedia.org/T103351) (owner: 10Hashar) [07:50:09] (03PS4) 10Hashar: Jenkins now supports our MAC/KEXY algorithms [prod] [puppet] - 10https://gerrit.wikimedia.org/r/383122 (https://phabricator.wikimedia.org/T100518) [07:50:50] (03CR) 10Hashar: [C: 031] "Rebased and attached to T100518" [puppet] - 10https://gerrit.wikimedia.org/r/383122 (https://phabricator.wikimedia.org/T100518) (owner: 10Hashar) [07:51:56] (03PS1) 10Marostegui: db2080.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/383511 [07:52:37] (03CR) 10Marostegui: [C: 032] db2080.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/383511 (owner: 10Marostegui) [07:58:32] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3674692 (10Joe) >>! In T177276#3673007, @Legoktm wrote: >>>! In T177276#3671190, @Joe wrote: >> * There is no need for cache busters as... [08:00:05] marostegui and jynus: How many deployers does it take to do Commons 3D deployment (read only) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T0800). [08:00:05] No GERRIT patches in the queue for this window AFAICS. [08:00:16] (03PS5) 10Muehlenhoff: Jenkins now supports our MAC/KEXY algorithms [prod] [puppet] - 10https://gerrit.wikimedia.org/r/383122 (https://phabricator.wikimedia.org/T100518) (owner: 10Hashar) [08:01:00] (03CR) 10Muehlenhoff: [C: 032] Jenkins now supports our MAC/KEXY algorithms [prod] [puppet] - 10https://gerrit.wikimedia.org/r/383122 (https://phabricator.wikimedia.org/T100518) (owner: 10Hashar) [08:01:05] mmm [08:01:12] I set the time wrong [08:09:19] (03CR) 10Alexandros Kosiaris: [C: 031] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/383375 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [08:13:40] (03CR) 10Marostegui: "Since you pinged me on IRC....I am not sure why I was added to this review to be honest :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [08:14:39] (03CR) 10Amire80: "No problem, marostegui. I just hoped that you are familiar with db lists." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [08:15:21] (03CR) 10Marostegui: "> No problem, marostegui. I just hoped that you are familiar with db" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [08:15:54] marostegui: the reason I pinged you is that that patch does a lot of things with dblists, something that I don't touch often. [08:16:08] and if I'm not mistaked you did stuff with dblists before. [08:17:10] aharoni: I only really use it for maintenance or operational things over the list of wikis [08:17:12] 10Operations, 10Ops-Access-Requests, 10Analytics: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3674732 (10elukey) a:03Jgreen @Dzahn I am pretty sure that Jeff needs access to webrequest data due to the fact that they maintain a kafka consumer in the fundrais... [08:17:13] so I don't miss things [08:17:59] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3674735 (10Marostegui) [08:19:30] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2266773 (10Marostegui) [08:19:39] 10Operations, 10ops-eqiad, 10DBA: Decommission db1035 - https://phabricator.wikimedia.org/T176931#3674736 (10Marostegui) [08:19:47] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3674740 (10Marostegui) [08:19:50] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1037 - https://phabricator.wikimedia.org/T174902#3674739 (10Marostegui) [08:23:12] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Encrypt syslog traffic - https://phabricator.wikimedia.org/T136312#3674758 (10fgiunchedi) The eqiad change was reverted yesterday due to (among the problem above) labservices machines hanging and not being able to successfully talk TLS with... [08:25:29] Reedy: around? maybe you can take a look at https://gerrit.wikimedia.org/r/#/c/364428/ ? [08:27:31] 10Operations, 10cloud-services-team: Switch labstore servers to default SSH configuration - https://phabricator.wikimedia.org/T177914#3674761 (10MoritzMuehlenhoff) [08:28:41] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) [08:30:06] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:32:22] (03PS2) 10Marostegui: db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) [08:33:00] (03PS3) 10Marostegui: db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) [08:35:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:36:00] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3674779 (10fgiunchedi) >>! In T177225#3674335, @Dzahn wrote: > Looks like this is the relevant list https://wikitech.wikimedia.org/wiki/Prometheus#Ganglia_plugins to see which plugins... [08:36:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:36:50] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383516 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [08:36:57] PROBLEM - puppet last run on lvs4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:37:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1103 - T164488 (duration: 00m 47s) [08:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:46] T164488: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488 [08:37:54] !log Stop replication in sync on db1103 and db1038 to checksum their data - T164488 [08:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:26] !log reboot kafka-jumbo hosts for kernel updates [08:38:26] 10Operations, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Improvements to Ganglia-equivalent Prometheus dashboards - https://phabricator.wikimedia.org/T152791#3674785 (10fgiunchedi) 05Open>03Resolved I'm resolving this task as all major use cases have been covered. [08:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:30] (03CR) 10Ema: [C: 032] varnish reload-vcl: avoid using columns in vcl_label [puppet] - 10https://gerrit.wikimedia.org/r/383369 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [08:49:37] (03PS2) 10Ema: varnish reload-vcl: avoid using columns in vcl_label [puppet] - 10https://gerrit.wikimedia.org/r/383369 (https://phabricator.wikimedia.org/T168529) [08:49:42] (03CR) 10Ema: [V: 032 C: 032] varnish reload-vcl: avoid using columns in vcl_label [puppet] - 10https://gerrit.wikimedia.org/r/383369 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [08:50:11] !log installing ffmpeg security updates [08:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:23] (03CR) 10Joal: [C: 031] "LGTM :) Thanks mforns" [puppet] - 10https://gerrit.wikimedia.org/r/383332 (https://phabricator.wikimedia.org/T164497) (owner: 10Mforns) [09:03:17] (03PS1) 10Muehlenhoff: Add library hint for curl [puppet] - 10https://gerrit.wikimedia.org/r/383518 [09:04:19] (03CR) 10Muehlenhoff: [C: 032] Add library hint for curl [puppet] - 10https://gerrit.wikimedia.org/r/383518 (owner: 10Muehlenhoff) [09:04:25] (03PS2) 10Muehlenhoff: Add library hint for curl [puppet] - 10https://gerrit.wikimedia.org/r/383518 [09:04:47] !log installing curl security updates on app server canaries (along with HHVM restart) [09:04:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:58] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:08:54] (03PS2) 10Elukey: Add cron job for analytics banner activity cleaner [puppet] - 10https://gerrit.wikimedia.org/r/383332 (https://phabricator.wikimedia.org/T164497) (owner: 10Mforns) [09:10:18] (03PS1) 10Giuseppe Lavagetto: base::firewall: rename to profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/383519 [09:10:29] <_joe_> gehel: I wouldn't try to merge this patch though now [09:11:05] yeah, that changes quite a few things... [09:13:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383520 [09:13:46] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383520 [09:16:05] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383520 (owner: 10Marostegui) [09:17:28] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383520 (owner: 10Marostegui) [09:17:36] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383520 (owner: 10Marostegui) [09:18:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1082 - T174509 (duration: 00m 47s) [09:18:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:31] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [09:18:32] 10Operations, 10Patch-For-Review: Restructure our internal repositories further - https://phabricator.wikimedia.org/T158583#3674871 (10hashar) For CI the Zend PHP 5.5 packages for jessie landed in `component/ci`. That nicely solved the use case I had. Thank you! [09:21:20] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383524 (https://phabricator.wikimedia.org/T174509) [09:24:34] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383524 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [09:27:25] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383524 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [09:27:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383524 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [09:27:55] !log Optimize pagelinks and templatelinks on db1087 - T174509 [09:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:01] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [09:28:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1087 - T174509 (duration: 00m 46s) [09:28:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:22] !log installing libxfont security updates [09:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:37] (03PS6) 10Volans: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 [09:32:39] (03PS11) 10Volans: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) [09:32:41] (03PS11) 10Volans: setup.py and tox: spit dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [09:35:31] (03CR) 10Elukey: [C: 032] Add cron job for analytics banner activity cleaner [puppet] - 10https://gerrit.wikimedia.org/r/383332 (https://phabricator.wikimedia.org/T164497) (owner: 10Mforns) [09:35:33] (03CR) 10Gehel: [C: 031] "Good enough for me!" [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [09:36:18] (03CR) 10Volans: [C: 032] Docstrings: use Google Style [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [09:38:40] (03Merged) 10jenkins-bot: Docstrings: use Google Style [software/cumin] - 10https://gerrit.wikimedia.org/r/382479 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [09:47:51] (03PS6) 10Muehlenhoff: Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) [09:48:22] (03CR) 10jerkins-bot: [V: 04-1] Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [09:52:41] (03PS1) 10Filippo Giunchedi: profile: add check_smart to selectively enable ::smart class [puppet] - 10https://gerrit.wikimedia.org/r/383528 (https://phabricator.wikimedia.org/T86552) [09:52:43] (03PS1) 10Filippo Giunchedi: hieradata: rollout check_smart on a subset of codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) [09:53:15] (03PS2) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [09:53:17] (03CR) 10Volans: [C: 032] "Documentation format conversion only, self-merging" [software/cumin] - 10https://gerrit.wikimedia.org/r/382480 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [09:53:47] (03CR) 10jerkins-bot: [V: 04-1] maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) (owner: 10Gehel) [09:54:44] (03PS1) 10Muehlenhoff: Add library hint for libxfont [puppet] - 10https://gerrit.wikimedia.org/r/383530 [09:56:47] (03Merged) 10jenkins-bot: Documentation: convert Markdown to reStructuredText [software/cumin] - 10https://gerrit.wikimedia.org/r/382480 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [09:56:59] (03PS3) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [09:57:07] (03CR) 10Volans: [C: 032] CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 (owner: 10Volans) [09:59:15] (03Merged) 10jenkins-bot: CLI: extract parser definition from parse_args() [software/cumin] - 10https://gerrit.wikimedia.org/r/382481 (owner: 10Volans) [10:10:56] (03CR) 10Jcrespo: [C: 031] "ok for me about the db2* hosts" [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:11:39] (03CR) 10Marostegui: [C: 031] "Looks good for the DB hosts." [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:13:41] :) [10:13:57] get out of my brain [10:14:13] haha [10:17:36] (03PS1) 10Ema: varnish reload-vcl: ensure VCL name starts with a letter [puppet] - 10https://gerrit.wikimedia.org/r/383533 (https://phabricator.wikimedia.org/T168529) [10:19:07] !log installing curl security updates [10:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:45] (03CR) 10Ema: [C: 032] varnish reload-vcl: ensure VCL name starts with a letter [puppet] - 10https://gerrit.wikimedia.org/r/383533 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [10:22:47] (03PS1) 10Volans: CLI: fix config validation [software/cumin] - 10https://gerrit.wikimedia.org/r/383534 [10:24:55] !log stopping dbstore2001:s3 for maintenance T177908 [10:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:02] T177908: dbstore2001:s3 crashed while backups were running - https://phabricator.wikimedia.org/T177908 [10:44:07] !log starting OTRS upgrade T176221 [10:44:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:13] T176221: Upgrade OTRS to 5.0.23 - https://phabricator.wikimedia.org/T176221 [10:45:50] PROBLEM - OTRS SMTP on mendelevium is CRITICAL: connect to address 10.64.32.174 and port 25: Connection refused [10:45:59] PROBLEM - Check systemd state on mendelevium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:51:50] RECOVERY - OTRS SMTP on mendelevium is OK: SMTP OK - 0.004 sec. response time [10:51:59] RECOVERY - Check systemd state on mendelevium is OK: OK - running: The system is fully operational [10:55:48] !log OTRS upgrade done T176221 [10:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:55:55] T176221: Upgrade OTRS to 5.0.23 - https://phabricator.wikimedia.org/T176221 [11:05:25] (03PS1) 10Ema: hieradata: add cache::misc::nodes to labs.yaml [puppet] - 10https://gerrit.wikimedia.org/r/383536 (https://phabricator.wikimedia.org/T177233) [11:05:27] (03PS1) 10Alexandros Kosiaris: Update Templates for 5.0.23 OTRS version [software/otrs] - 10https://gerrit.wikimedia.org/r/383537 [11:05:52] (03CR) 10Volans: [C: 032] setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 (owner: 10Volans) [11:06:04] (03CR) 10Ema: [C: 032] hieradata: add cache::misc::nodes to labs.yaml [puppet] - 10https://gerrit.wikimedia.org/r/383536 (https://phabricator.wikimedia.org/T177233) (owner: 10Ema) [11:06:36] (03PS1) 10Volans: README: put build-relates icons in the same line [software/cumin] - 10https://gerrit.wikimedia.org/r/383538 [11:07:24] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Update Templates for 5.0.23 OTRS version [software/otrs] - 10https://gerrit.wikimedia.org/r/383537 (owner: 10Alexandros Kosiaris) [11:08:11] (03Merged) 10jenkins-bot: setup.py: prepare for PyPi submission [software/cumin] - 10https://gerrit.wikimedia.org/r/382482 (owner: 10Volans) [11:08:55] (03CR) 10Volans: [C: 032] Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [11:10:01] (03PS12) 10Volans: setup.py and tox: split dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 [11:10:03] (03PS2) 10Volans: CLI: fix config validation [software/cumin] - 10https://gerrit.wikimedia.org/r/383534 [11:10:05] (03PS2) 10Volans: README: put build-relates icons in the same line [software/cumin] - 10https://gerrit.wikimedia.org/r/383538 [11:11:26] !log starting purge of ruwiki.recentchanges T177772 [11:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:32] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772 [11:13:56] s6 looking good so far [11:14:08] (03Merged) 10jenkins-bot: Documentation: Sphinx setup [software/cumin] - 10https://gerrit.wikimedia.org/r/382483 (https://phabricator.wikimedia.org/T159308) (owner: 10Volans) [11:18:22] 10Operations, 10OTRS: Upgrade OTRS to 5.0.23 - https://phabricator.wikimedia.org/T176221#3675088 (10akosiaris) 05Open>03Resolved Upgrade done, wikimedia templates upgrade as well (in https://gerrit.wikimedia.org/r/#/c/383537/), all looks good, resolving [11:18:44] (03CR) 10Volans: [C: 032] setup.py and tox: split dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [11:19:58] !log re-enable OTRS stats group T176221 [11:20:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:05] T176221: Upgrade OTRS to 5.0.23 - https://phabricator.wikimedia.org/T176221 [11:21:03] (03Merged) 10jenkins-bot: setup.py and tox: split dependencies [software/cumin] - 10https://gerrit.wikimedia.org/r/382484 (owner: 10Volans) [11:22:01] (03CR) 10Volans: [C: 032] CLI: fix config validation [software/cumin] - 10https://gerrit.wikimedia.org/r/383534 (owner: 10Volans) [11:24:03] (03Merged) 10jenkins-bot: CLI: fix config validation [software/cumin] - 10https://gerrit.wikimedia.org/r/383534 (owner: 10Volans) [11:24:56] (03CR) 10Volans: [C: 032] README: put build-relates icons in the same line [software/cumin] - 10https://gerrit.wikimedia.org/r/383538 (owner: 10Volans) [11:26:50] (03Merged) 10jenkins-bot: README: put build-relates icons in the same line [software/cumin] - 10https://gerrit.wikimedia.org/r/383538 (owner: 10Volans) [11:28:41] 10Operations, 10Mail, 10OTRS: E-mails from Qualtrics to OTRS not delivered - https://phabricator.wikimedia.org/T170427#3675115 (10akosiaris) 05Open>03declined I am gonna close this as `declined` as I don't think there is something we can do about it. Feel free to re-open with suggestions though! [11:34:38] !log installing ruby1.9 security updates on trusty hosts [11:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:38] (03PS1) 10Ema: cp3007: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383541 (https://phabricator.wikimedia.org/T177233) [11:46:26] (03PS7) 10Muehlenhoff: Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) [11:46:56] (03CR) 10jerkins-bot: [V: 04-1] Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [11:47:29] (03CR) 10Ema: [C: 032] cp3007: upgrade to Varnish 5 [puppet] - 10https://gerrit.wikimedia.org/r/383541 (https://phabricator.wikimedia.org/T177233) (owner: 10Ema) [11:48:27] (03PS1) 10Hashar: apt spec enhancement for Moritz [puppet] - 10https://gerrit.wikimedia.org/r/383542 [11:49:04] !log upgrading cp3007 to Varnish 5 T177233 [11:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:11] T177233: Upgrade cache_misc to Varnish 5 - https://phabricator.wikimedia.org/T177233 [11:54:08] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383543 [11:56:18] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383543 (owner: 10Marostegui) [11:57:42] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383543 (owner: 10Marostegui) [11:57:51] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383543 (owner: 10Marostegui) [11:58:46] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 - T174509 (duration: 00m 50s) [11:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:54] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [11:59:55] (03PS8) 10Muehlenhoff: Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) [12:00:05] hoo: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata usage tracking deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T1200). [12:00:05] No GERRIT patches in the queue for this window AFAICS. [12:04:05] :P [12:04:45] (03CR) 10Hoo man: [C: 032] Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:05:33] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:06:00] (03CR) 10Hoo man: "will amend" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:10:39] 10Operations, 10Contributors-Team, 10MobileFrontend, 10wikidiff2, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3675211 (10MoritzMuehlenhoff) >>! In T176637#3672117, @jkroll wrote: > That was a misunderstanding then, I thought y... [12:14:20] (03PS2) 10Hoo man: Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) [12:14:22] (03PS2) 10Hoo man: Enable Statement usage tracking on cawiki and cewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383440 (https://phabricator.wikimedia.org/T151717) [12:19:21] (03CR) 10Hoo man: [C: 032] Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:20:02] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.4.1 in production - https://phabricator.wikimedia.org/T177891#3675219 (10MoritzMuehlenhoff) [12:20:31] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3675220 (10Addshore) [12:21:07] 10Operations, 10wikidiff2, 10User-Addshore, 10WMDE-QWERTY-Team-Board: Update and use php-wikidiff2 to 1.5 in production - https://phabricator.wikimedia.org/T177891#3674094 (10Addshore) [12:21:23] moritzm: I think the checklist in that ticket is now correct for a rollout then :) [12:21:33] (03Merged) 10jenkins-bot: Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:21:42] (03CR) 10jenkins-bot: Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383439 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:22:06] addshore: yeah, looks good. I'll update the package on Friday, we can look into updating the mwdebug servers starting next week [12:22:16] awesome! :) [12:25:02] !log hoo@tin Started scap: wmf-config/InitialiseSettings.php Introduce $wmgWikibaseDisabledUsageAspects [12:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:18] dammit, that was supposed to be scap sync-file not scap sync *grr* [12:26:54] Is it safe to abort? Does anyone know? addshore? [12:27:02] *looks* [12:27:18] as far as I know it should be [12:27:21] 10Operations, 10Analytics, 10User-Elukey: Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls - https://phabricator.wikimedia.org/T177927#3675250 (10elukey) [12:27:36] !log hoo@tin scap aborted: wmf-config/InitialiseSettings.php Introduce $wmgWikibaseDisabledUsageAspects (duration: 02m 34s) [12:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:13] the next full scap shouldnt be hurt by the abort, and no other scaps should be affected [12:28:13] it was still doing stuff locally only… so this should be fine [12:28:16] (03CR) 10Zoranzoki21: [C: 031] Enable Statement usage tracking on cawiki and cewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383440 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:28:17] yup [12:28:28] I'll file a bug with scap later on [12:28:46] [= [12:28:57] I gave it a file… that might be a reason to abort [12:29:00] but I'm not sure [12:29:31] wouldn't have happened with the old sync-file… there was no plain sync [12:30:25] !log hoo@tin Synchronized wmf-config/InitialiseSettings.php: Introduce $wmgWikibaseDisabledUsageAspects (duration: 00m 47s) [12:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:12] !log hoo@tin Synchronized wmf-config/: Move WB client "disabledUsageAspects" setting into $wmgWikibaseDisabledUsageAspects (duration: 00m 48s) [12:32:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:54] (03CR) 10Hoo man: [C: 032] Enable Statement usage tracking on cawiki and cewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383440 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:36:22] (03Merged) 10jenkins-bot: Enable Statement usage tracking on cawiki and cewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383440 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:36:31] (03CR) 10jenkins-bot: Enable Statement usage tracking on cawiki and cewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383440 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:38:06] !log hoo@tin Synchronized wmf-config/InitialiseSettings.php: Enable Statement usage tracking on cawiki and cewiki (T151717) (duration: 00m 47s) [12:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:13] T151717: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717 [12:39:39] Works [12:39:49] but usages are coming in at an alarming rate on cawiki [12:44:24] (03PS4) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [12:44:42] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/8274/" [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [12:44:53] !log hoo@tin Synchronized wmf-config/InitialiseSettings.php: (temp) Disable Statement usage tracking on cawiki (T151717) (duration: 00m 48s) [12:44:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:59] T151717: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717 [12:45:03] hoo: how alarming a rate? (im not quite sure what your doing) :D [12:45:13] Like 20k+ a minute [12:46:34] (03PS2) 10Lucas Werkmeister (WMDE): Enable constraint checks on qualifiers and references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383163 (https://phabricator.wikimedia.org/T176863) [12:48:49] !log Kill recentchanges purge on s4 primary master - https://phabricator.wikimedia.org/T177772 [12:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:57] (03PS1) 10Hoo man: Disable statement usage tracking on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383556 (https://phabricator.wikimedia.org/T151717) [12:49:53] (03CR) 10Hoo man: [C: 032] Disable statement usage tracking on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383556 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:52:11] (03Merged) 10jenkins-bot: Disable statement usage tracking on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383556 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:52:25] (03CR) 10jenkins-bot: Disable statement usage tracking on cawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383556 (https://phabricator.wikimedia.org/T151717) (owner: 10Hoo man) [12:53:30] !log Start recentchanges purge on s4 primary master T177772 [12:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:38] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772 [12:54:16] !log hoo@tin Synchronized wmf-config/InitialiseSettings.php: Disable Statement usage tracking on cawiki (T151717) (duration: 00m 47s) [12:54:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:23] T151717: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717 [12:54:24] (03PS2) 10Muehlenhoff: Add library hint for libxfont [puppet] - 10https://gerrit.wikimedia.org/r/383530 [12:54:55] (03PS1) 10Elukey: profile::kafka::broker::monitoring: refactor prometheus metric names [puppet] - 10https://gerrit.wikimedia.org/r/383557 (https://phabricator.wikimedia.org/T177078) [12:55:37] (03CR) 10Muehlenhoff: [C: 032] Add library hint for libxfont [puppet] - 10https://gerrit.wikimedia.org/r/383530 (owner: 10Muehlenhoff) [12:56:52] (03PS5) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [12:57:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] apt spec enhancement for Moritz (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/383542 (owner: 10Hashar) [12:58:44] (03PS8) 10Herron: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T1300). [13:00:04] Jayprakash12345, gehel, Lucas_WMDE, and aharoni: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:01:05] I'm also here for https://gerrit.wikimedia.org/r/#/c/382988/ deploying mapframe to eswiki [13:01:37] o/ [13:01:44] let me finish something quickly :] [13:02:14] jouncebot: o/ [13:02:38] yo, I'm here, too [13:03:20] !log test graphite-web patch on labmon1001 - T177747 [13:03:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:27] T177747: grafana-labs often fails to generate graphs with c.datapoints is undefined - https://phabricator.wikimedia.org/T177747 [13:04:05] 10Operations, 10OCG-General, 10Services (watching): Decommission OCG from production - https://phabricator.wikimedia.org/T177931#3675377 (10faidon) [13:05:10] hashar gilles can you give grafana-labs another try re: the graphite issue? [13:05:25] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler02/8277/" [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) (owner: 10Herron) [13:05:48] godog: I can't trigger the issue anymore, thanks! [13:06:42] gilles: heh I hotfixed it but we need to find a way to have the fix properly available :| I'll update the task [13:06:45] (03CR) 10Elukey: [C: 031] "Looks good to me, checked on a random mw server http://localhost:9002/dump-apc-info and the values looks good. Testing is also consistent " [software/hhvm_exporter] - 10https://gerrit.wikimedia.org/r/382728 (https://phabricator.wikimedia.org/T177196) (owner: 10Filippo Giunchedi) [13:07:34] aharoni: I'm around too for testing etc :) [13:07:40] cool [13:08:05] jouncebot: next [13:08:05] In 4 hour(s) and 51 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T1800) [13:08:16] (03CR) 10Herron: [C: 032] Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) (owner: 10Herron) [13:08:23] (03PS9) 10Herron: Add letsencrypt certs to mx servers [puppet] - 10https://gerrit.wikimedia.org/r/375427 (https://phabricator.wikimedia.org/T174081) [13:08:31] godog: I guess the easiest is to cherry pick the patch and rebuild a custom package for us :) [13:08:59] gehel: doing your eswiki mapframe [13:09:11] (03CR) 10Hashar: [C: 032] Enable mapframe on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382988 (https://phabricator.wikimedia.org/T177695) (owner: 10Jayprakash12345) [13:09:38] Jayprakash12345 ^ [13:13:15] (03CR) 10Alexandros Kosiaris: [C: 04-2] "After talking with Moritz, I realized this is more like a PoC. The idea has been incorporated in https://gerrit.wikimedia.org/r/#/c/357559" [puppet] - 10https://gerrit.wikimedia.org/r/383542 (owner: 10Hashar) [13:13:24] (03Abandoned) 10Alexandros Kosiaris: apt spec enhancement for Moritz [puppet] - 10https://gerrit.wikimedia.org/r/383542 (owner: 10Hashar) [13:13:37] hashar: mhh I just noticed we got 0.9.15+debian-2 in jessie-wikimedia/backports, though we'd need to test it first [13:14:05] godog: maybe that one is used for the production grafana already? [13:14:37] grbmbmbl [13:14:51] hashar: nope that's 0.13, anyways I'm updating the task [13:15:12] (03PS4) 10Hashar: Enable mapframe on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382988 (https://phabricator.wikimedia.org/T177695) (owner: 10Jayprakash12345) [13:15:22] (03CR) 10Hashar: [C: 032] Enable mapframe on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382988 (https://phabricator.wikimedia.org/T177695) (owner: 10Jayprakash12345) [13:16:09] (03PS3) 10Hashar: Enable constraint checks on qualifiers and references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383163 (https://phabricator.wikimedia.org/T176863) (owner: 10Lucas Werkmeister (WMDE)) [13:16:11] (03PS3) 10Hashar: Configure wmgBabelMainCategory for the Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366233 (owner: 10Amire80) [13:16:13] (03PS12) 10Hashar: Remove compact language links dblist for simplicity (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [13:16:17] I have rebased all the patches for SWAT [13:16:46] (03Merged) 10jenkins-bot: Enable mapframe on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382988 (https://phabricator.wikimedia.org/T177695) (owner: 10Jayprakash12345) [13:16:54] (03CR) 10jenkins-bot: Enable mapframe on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382988 (https://phabricator.wikimedia.org/T177695) (owner: 10Jayprakash12345) [13:17:18] gehel: patch is finally on mwdebug1001 [13:17:36] debt: your turn! [13:17:52] Lucas_WMDE: around for "Enable constraint checks on qualifiers and references" ? :) [13:18:09] hashar: yeah :) [13:18:16] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383163 (https://phabricator.wikimedia.org/T176863) (owner: 10Lucas Werkmeister (WMDE)) [13:19:41] (03Merged) 10jenkins-bot: Enable constraint checks on qualifiers and references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383163 (https://phabricator.wikimedia.org/T176863) (owner: 10Lucas Werkmeister (WMDE)) [13:19:50] (03CR) 10jenkins-bot: Enable constraint checks on qualifiers and references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383163 (https://phabricator.wikimedia.org/T176863) (owner: 10Lucas Werkmeister (WMDE)) [13:21:01] Lucas_WMDE: and it is on mwdebug1001 :) [13:21:21] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366233 (owner: 10Amire80) [13:21:22] testing… [13:21:38] aharoni: doing " Configure wmgBabelMainCategory for the Dinka Wikipedia" and will pull it on mwdebug1001 as soon as it merges [13:21:47] (03PS1) 10Muehlenhoff: Add library hint for db5.3 [puppet] - 10https://gerrit.wikimedia.org/r/383561 [13:21:49] thanks [13:21:50] !log installing db5.3 security updates [13:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:51] (03Merged) 10jenkins-bot: Configure wmgBabelMainCategory for the Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366233 (owner: 10Amire80) [13:23:00] (03CR) 10jenkins-bot: Configure wmgBabelMainCategory for the Dinka Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366233 (owner: 10Amire80) [13:23:20] aharoni: wmgBabelMainCategory for Dinka wiki is now on mwdebug1001 [13:23:38] hashar: seems to work, thanks :) [13:24:18] Lucas_WMDE: cool! I am syncing it cluster wide [13:24:30] thanks! [13:24:36] hashar: seems to work [13:24:45] hashar: mapframe works on eswiki [13:24:55] !log hashar@tin Synchronized wmf-config/Wikibase-production.php: Enable constraint checks on qualifiers and references - T176863 (duration: 00m 47s) [13:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:02] gehel: debt : you are awesome :) [13:25:02] T176863: Enable constraint checks on qualifiers and references on Wikidata - https://phabricator.wikimedia.org/T176863 [13:25:02] (03CR) 10Muehlenhoff: [C: 032] Add library hint for db5.3 [puppet] - 10https://gerrit.wikimedia.org/r/383561 (owner: 10Muehlenhoff) [13:25:17] hashar: testing [13:25:19] 10Operations, 10Analytics, 10Traffic, 10User-Elukey: Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls - https://phabricator.wikimedia.org/T177927#3675554 (10ema) p:05Triage>03Normal [13:25:36] gehel: debt: will sync it together with aharoni change when he is done :) [13:25:46] awww, thanks hashar! [13:25:47] ok [13:26:06] hashar: tested, works, all good. [13:27:30] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable mapframe on eswiki - T177695 (duration: 00m 47s) [13:27:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:37] T177695: Activate in eswiki - https://phabricator.wikimedia.org/T177695 [13:27:40] 13:27:19 Check 'Logstash Error rate for mw1262.eqiad.wmnet' failed: ERROR: 75% OVER_THRESHOLD (Avg. Error rate: Before: 0.00, After: 4.00, Threshold: 1.00) [13:27:40] bah [13:27:42] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3675559 (10akosiaris) >>! In T177225#3672942, @Dzahn wrote: >>>! In T177225#3669570, @akosiaris wrote: >> it got me thinking how are we going to clean up the fleet ? Should we use pup... [13:27:44] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383562 [13:28:24] false alarm [13:29:31] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Configure wmgBabelMainCategory for the Dinka Wikipedia (duration: 00m 47s) [13:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:42] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [13:30:10] aharoni: and I am doing the compact language thing [13:30:41] hashar: ack [13:31:47] and if one of you could review a basic phpcs tweak for mediawiki-config https://gerrit.wikimedia.org/r/#/c/383234/ that would be great ::] [13:32:00] it is basically meant to stop analyzing files that are autogenerated :) [13:32:30] (03Merged) 10jenkins-bot: Remove compact language links dblist for simplicity (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [13:32:43] (03CR) 10jenkins-bot: Remove compact language links dblist for simplicity (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/364428 (owner: 10Amire80) [13:32:48] hashar: thanks! [13:33:03] (03CR) 10Ottomata: [C: 031] ":D" [puppet] - 10https://gerrit.wikimedia.org/r/383557 (https://phabricator.wikimedia.org/T177078) (owner: 10Elukey) [13:33:14] aharoni: it is on mwdebug1001 :) [13:33:29] hashar: testing [13:34:28] (03PS2) 10Elukey: profile::kafka::broker::monitoring: refactor prometheus metric names [puppet] - 10https://gerrit.wikimedia.org/r/383557 (https://phabricator.wikimedia.org/T177078) [13:35:03] (03PS2) 10Ottomata: Do not store PopUps events on MySQL [puppet] - 10https://gerrit.wikimedia.org/r/383389 (https://phabricator.wikimedia.org/T176469) (owner: 10Nuria) [13:35:05] (03CR) 10Hashar: [C: 032] PHP CodeSniffer no more process autogenerated files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383234 (owner: 10Hashar) [13:35:18] (03CR) 10jerkins-bot: [V: 04-1] Do not store PopUps events on MySQL [puppet] - 10https://gerrit.wikimedia.org/r/383389 (https://phabricator.wikimedia.org/T176469) (owner: 10Nuria) [13:35:35] hashar: tested, good to go. [13:35:46] aharoni: looks good on gu.wikisource and dewiki [13:36:02] kart_: check out also din.wikipedia and atj.wikipedia [13:36:04] (I already did) [13:36:18] (03Merged) 10jenkins-bot: PHP CodeSniffer no more process autogenerated files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383234 (owner: 10Hashar) [13:36:30] 10Operations, 10Analytics, 10Traffic, 10User-Elukey: Refactor kafka_config.rb and and kafka_cluster_name.rb in puppet to avoid explicit hiera calls - https://phabricator.wikimedia.org/T177927#3675614 (10Ottomata) This would really only require passing `kafka_clusters` as well as `kafka_cluster_name` to the... [13:36:46] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Remove compact language links dblist for simplicity (duration: 00m 47s) [13:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:48] aharoni: yes. looks good. [13:37:54] !log hashar@tin Synchronized dblists: Remove compact language links dblist for simplicity (duration: 00m 47s) [13:38:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:04] (03CR) 10jenkins-bot: PHP CodeSniffer no more process autogenerated files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383234 (owner: 10Hashar) [13:38:32] aharoni: kart_: will be on the whole cluster in a few [13:38:49] !log hashar@tin Synchronized docroot: Remove compact language links dblist for simplicity (duration: 00m 48s) [13:38:52] thanks hashar and aharoni ! [13:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:58] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3675624 (10BBlack) Still says `101-I/O ROM Error` twice on every boot attempt, new NIC card has older firmware. PXE boot still doesn't work (tried setting `Boot Strap Type` to `int1... [13:39:48] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Remove compact language links dblist for simplicity (duration: 00m 46s) [13:39:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:30] !log European SWAT completed [13:40:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:39] marostegui: I think you can merge your db patches now [13:41:48] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3675626 (10MoritzMuehlenhoff) >> That, as is at least, would not be enough (and hence the cumin work would be required). But I 'd say that decom class also needs a forced removal of `/... [13:42:06] hashar, kart_ : it's live. all good. thank you! [13:42:18] aharoni: kart_ \o/ [13:45:23] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/383557 (https://phabricator.wikimedia.org/T177078) (owner: 10Elukey) [13:49:06] (03PS1) 10BBlack: lvs1009: install stretch because HP sucks [puppet] - 10https://gerrit.wikimedia.org/r/383565 [13:49:08] (03CR) 10Elukey: [C: 032] profile::kafka::broker::monitoring: refactor prometheus metric names [puppet] - 10https://gerrit.wikimedia.org/r/383557 (https://phabricator.wikimedia.org/T177078) (owner: 10Elukey) [13:49:18] (03CR) 10BBlack: [V: 032 C: 032] lvs1009: install stretch because HP sucks [puppet] - 10https://gerrit.wikimedia.org/r/383565 (owner: 10BBlack) [13:50:05] (03Abandoned) 10Hashar: Enable Extension:DynamicPageList to Turkish Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382357 (https://phabricator.wikimedia.org/T177448) (owner: 10Jayprakash12345) [13:50:22] bblack: lol [13:50:38] (03CR) 10Gilles: "Ping?" [puppet] - 10https://gerrit.wikimedia.org/r/380942 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [13:50:44] (03CR) 10Gilles: "Ping?" [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [13:51:18] (03PS2) 10BBlack: lvs1009: install stretch because HP sucks [puppet] - 10https://gerrit.wikimedia.org/r/383565 [13:51:21] (03CR) 10BBlack: [V: 032 C: 032] lvs1009: install stretch because HP sucks [puppet] - 10https://gerrit.wikimedia.org/r/383565 (owner: 10BBlack) [13:52:09] (03PS2) 10Amire80: Deploy Compact Language Links on the German Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383357 (https://phabricator.wikimedia.org/T177836) (owner: 10KartikMistry) [13:58:51] 10Operations, 10Discovery, 10Discovery-Analysis, 10Release-Engineering-Team (Watching / External): Setup a mirror for R language dependencies (CRAN) - https://phabricator.wikimedia.org/T170995#3675668 (10hashar) [13:58:52] (03PS1) 10Ottomata: Rename 10.2.2.38 to druid-public [dns] - 10https://gerrit.wikimedia.org/r/383568 (https://phabricator.wikimedia.org/T176223) [13:59:22] (03CR) 10Ottomata: [C: 032] Rename 10.2.2.38 to druid-public [dns] - 10https://gerrit.wikimedia.org/r/383568 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [14:01:25] ottomata: hahaha grumpyness never pays (tm) [14:02:51] haha [14:10:04] (03PS1) 10Gehel: New role to validate maps with new vector tiles. [labs/private] - 10https://gerrit.wikimedia.org/r/383573 (https://phabricator.wikimedia.org/T153282) [14:11:08] (03CR) 10Gehel: [V: 032 C: 032] New role to validate maps with new vector tiles. [labs/private] - 10https://gerrit.wikimedia.org/r/383573 (https://phabricator.wikimedia.org/T153282) (owner: 10Gehel) [14:17:56] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3675717 (10BBlack) I gave in and tried a stretch network install on lvs1009 for comparison. I didn't make any bios/firmware changes there, just used RBSU console to `onetimeboot net... [14:17:58] (03PS1) 10Giuseppe Lavagetto: ocg: remove from load-balancers [puppet] - 10https://gerrit.wikimedia.org/r/383577 [14:18:01] (03PS1) 10Giuseppe Lavagetto: ocg: remove from conftool [puppet] - 10https://gerrit.wikimedia.org/r/383578 (https://phabricator.wikimedia.org/T177931) [14:18:02] (03PS1) 10Giuseppe Lavagetto: ocg100*: convert to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/383579 (https://phabricator.wikimedia.org/T188931) [14:18:04] (03PS1) 10Giuseppe Lavagetto: ocg: remove all references from puppet [puppet] - 10https://gerrit.wikimedia.org/r/383580 (https://phabricator.wikimedia.org/T177931) [14:18:29] (03PS1) 10Giuseppe Lavagetto: Remove references to OCG services [dns] - 10https://gerrit.wikimedia.org/r/383581 (https://phabricator.wikimedia.org/T177931) [14:20:04] (03CR) 10BBlack: [C: 031] ocg: remove from load-balancers [puppet] - 10https://gerrit.wikimedia.org/r/383577 (owner: 10Giuseppe Lavagetto) [14:20:32] (03PS1) 10Cmjohnson: Adding production dns for db1107/1108 T177405 [dns] - 10https://gerrit.wikimedia.org/r/383583 [14:21:14] (03CR) 10Alexandros Kosiaris: [C: 031] ocg: remove from load-balancers [puppet] - 10https://gerrit.wikimedia.org/r/383577 (owner: 10Giuseppe Lavagetto) [14:21:33] (03CR) 10Cmjohnson: [C: 032] Adding production dns for db1107/1108 T177405 [dns] - 10https://gerrit.wikimedia.org/r/383583 (owner: 10Cmjohnson) [14:23:18] (03CR) 10Elukey: [C: 031] ocg: remove from load-balancers [puppet] - 10https://gerrit.wikimedia.org/r/383577 (owner: 10Giuseppe Lavagetto) [14:23:45] (03CR) 10Giuseppe Lavagetto: [C: 032] ocg: remove from load-balancers [puppet] - 10https://gerrit.wikimedia.org/r/383577 (owner: 10Giuseppe Lavagetto) [14:23:45] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.309 second response time [14:24:12] andrewbogott: ^ awake yet? [14:27:00] (03CR) 10Muehlenhoff: [C: 031] ocg100*: convert to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/383579 (https://phabricator.wikimedia.org/T188931) (owner: 10Giuseppe Lavagetto) [14:27:07] chasemp: I am! No idea what that alert means but I'll look [14:27:15] andrewbogott: https://phabricator.wikimedia.org/T177944 [14:27:18] but that's all I know [14:27:40] actual for real k8s services seem ok [14:27:53] andrewbogott: I think it's just one node going bad but no idea why [14:27:57] that check is all or nothing [14:28:56] (03CR) 10Muehlenhoff: ocg: remove all references from puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383580 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [14:29:46] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.196 second response time [14:30:54] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3675774 (10BBlack) I figured as a next minimal testing step on lvs1009, should just go into the ethernet firmware (Ctrl+S) and try disabling SR-IOV and/or HP Shared Memory Features,... [14:32:27] (03CR) 10Giuseppe Lavagetto: ocg: remove all references from puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383580 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [14:34:25] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:36:04] (03PS8) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [14:36:19] <_joe_> uh? [14:36:31] <_joe_> oh right, I should've removed lvs::realserver [14:36:32] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [14:36:50] !log installing dbus update from stretch 9.2 point release [14:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:26] (03CR) 10Elukey: "Should also site.pp be updated since role::ocg is removed?" [puppet] - 10https://gerrit.wikimedia.org/r/383580 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [14:37:48] <_joe_> elukey: previous patch [14:38:06] ack! just seen it [14:41:04] (03PS9) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [14:41:32] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [14:42:17] <_joe_> uhm we do have a problem - icinga will alert when I finish running puppet [14:42:24] <_joe_> before it runs on einsteinium [14:45:07] (03PS1) 10Ema: varnish: remove varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/383586 [14:45:36] (03CR) 10jerkins-bot: [V: 04-1] varnish: remove varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/383586 (owner: 10Ema) [14:45:38] (03CR) 10Rush: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [14:46:05] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [14:47:12] (03PS2) 10Ema: varnish: remove varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/383586 [14:47:19] PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:47:43] <_joe_> !log rolling restart of low-traffic pybals in eqiad for T177931 [14:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:50] T177931: Decommission OCG from production - https://phabricator.wikimedia.org/T177931 [14:48:20] (03PS1) 10Gehel: New role to validate maps with new vector tiles. [labs/private] - 10https://gerrit.wikimedia.org/r/383588 (https://phabricator.wikimedia.org/T153282) [14:48:32] (03CR) 10Gehel: [V: 032 C: 032] New role to validate maps with new vector tiles. [labs/private] - 10https://gerrit.wikimedia.org/r/383588 (https://phabricator.wikimedia.org/T153282) (owner: 10Gehel) [14:54:09] PROBLEM - PyBal IPVS diff check on lvs1006 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([ocg1001.eqiad.wmnet, ocg1002.eqiad.wmnet, ocg1003.eqiad.wmnet]) [14:54:39] PROBLEM - PyBal IPVS diff check on lvs1003 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([ocg1001.eqiad.wmnet, ocg1002.eqiad.wmnet, ocg1003.eqiad.wmnet]) [14:54:52] <_joe_> fixing that :) [14:55:21] <_joe_> !log manually removing IPVS entries for ocg on eqiad LBs, T177931 [14:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:28] T177931: Decommission OCG from production - https://phabricator.wikimedia.org/T177931 [14:57:26] (03PS2) 10Giuseppe Lavagetto: Remove references to OCG services [dns] - 10https://gerrit.wikimedia.org/r/383581 (https://phabricator.wikimedia.org/T177931) [14:57:59] (03CR) 10Giuseppe Lavagetto: [C: 032] Remove references to OCG services [dns] - 10https://gerrit.wikimedia.org/r/383581 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [14:59:09] RECOVERY - PyBal IPVS diff check on lvs1006 is OK: OK: no difference between hosts in IPVS/PyBal [14:59:39] RECOVERY - PyBal IPVS diff check on lvs1003 is OK: OK: no difference between hosts in IPVS/PyBal [15:00:01] (03CR) 10Giuseppe Lavagetto: [C: 032] ocg: remove from conftool [puppet] - 10https://gerrit.wikimedia.org/r/383578 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [15:00:12] (03PS10) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [15:01:00] (03PS1) 10Ema: instrumentation: pools with one server are not misconfigured [debs/pybal] - 10https://gerrit.wikimedia.org/r/383591 (https://phabricator.wikimedia.org/T177815) [15:01:02] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:01:04] (03PS11) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [15:01:31] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:02:09] (03CR) 10Giuseppe Lavagetto: [C: 032] ocg100*: convert to role::spare::system [puppet] - 10https://gerrit.wikimedia.org/r/383579 (https://phabricator.wikimedia.org/T188931) (owner: 10Giuseppe Lavagetto) [15:02:55] (03PS12) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [15:03:13] (03PS13) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [15:03:47] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:07:19] RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:07:20] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: connection error: HTTPConnectionPool(host=localhost, port=8000): Read timed out. (read timeout=5) [15:08:49] <_joe_> !log ocg*: stop ocg; mv /srv/deployment /srv/stale, update-rc.d ocg disable, rm /etc/init/ocg.conf - T177931 [15:08:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:57] (03PS14) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [15:08:57] T177931: Decommission OCG from production - https://phabricator.wikimedia.org/T177931 [15:09:26] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [15:09:29] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [15:13:08] (03PS6) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [15:17:21] PROBLEM - puppet last run on lvs1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Exec[txqueuelen-eth3],Exec[txqueuelen-eth2] [15:21:38] <_joe_> !log stopped redis on ocg* T177931 [15:21:44] (03CR) 10Elukey: Set up separate druid public-eqiad cluster. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:46] T177931: Decommission OCG from production - https://phabricator.wikimedia.org/T177931 [15:22:20] 10Operations, 10OCG-General, 10Patch-For-Review, 10Services (watching): Decommission OCG from production - https://phabricator.wikimedia.org/T177931#3675975 (10Joe) [15:23:00] (03CR) 10Ottomata: Set up separate druid public-eqiad cluster. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:28:34] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383562 [15:30:08] (03PS7) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [15:30:41] (03CR) 10jerkins-bot: [V: 04-1] Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:31:09] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383562 (owner: 10Marostegui) [15:33:29] (03CR) 10Giuseppe Lavagetto: [C: 032] ocg: remove all references from puppet [puppet] - 10https://gerrit.wikimedia.org/r/383580 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [15:34:10] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383562 (owner: 10Marostegui) [15:35:02] (03PS8) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [15:35:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1086 - T174509 (duration: 00m 47s) [15:35:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:19] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [15:35:41] (03CR) 10Volans: [C: 031] "LGTM" [debs/pybal] - 10https://gerrit.wikimedia.org/r/383591 (https://phabricator.wikimedia.org/T177815) (owner: 10Ema) [15:35:43] (03PS6) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [15:36:13] (03PS9) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [15:36:30] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/383528 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [15:37:09] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [15:37:13] (03PS1) 10Giuseppe Lavagetto: ganglia: remove pdf cluster [puppet] - 10https://gerrit.wikimedia.org/r/383600 (https://phabricator.wikimedia.org/T177931) [15:37:40] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] ganglia: remove pdf cluster [puppet] - 10https://gerrit.wikimedia.org/r/383600 (https://phabricator.wikimedia.org/T177931) (owner: 10Giuseppe Lavagetto) [15:38:15] (03CR) 10Mforns: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/383185 (https://phabricator.wikimedia.org/T171629) (owner: 10Nuria) [15:38:20] (03PS1) 10Ottomata: Remove DRUID_HOSTS from list of allowed ferm for config cluster zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/383601 [15:39:26] !log drop tables listed in https://phabricator.wikimedia.org/T171629#3674250 from db1046, db1047, dbstore1002 [15:39:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:11] (03CR) 10Elukey: [C: 031] Remove DRUID_HOSTS from list of allowed ferm for config cluster zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/383601 (owner: 10Ottomata) [15:40:31] (03CR) 10Ottomata: [C: 032] Remove DRUID_HOSTS from list of allowed ferm for config cluster zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/383601 (owner: 10Ottomata) [15:40:37] (03PS2) 10Ottomata: Remove DRUID_HOSTS from list of allowed ferm for config cluster zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/383601 [15:40:55] (03CR) 10Ottomata: [V: 032 C: 032] Remove DRUID_HOSTS from list of allowed ferm for config cluster zookeeper [puppet] - 10https://gerrit.wikimedia.org/r/383601 (owner: 10Ottomata) [15:45:54] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/383316 (owner: 10Muehlenhoff) [15:48:04] (03PS4) 10Nuria: Removing from whitelist tables that no longer exist [puppet] - 10https://gerrit.wikimedia.org/r/383185 (https://phabricator.wikimedia.org/T171629) [15:48:55] 10Operations, 10hardware-requests: Decommission ocg1001-3 - https://phabricator.wikimedia.org/T177958#3676074 (10Joe) [15:50:31] PROBLEM - Host lvs1009 is DOWN: PING CRITICAL - Packet loss = 100% [15:50:36] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1086" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383562 (owner: 10Marostegui) [15:51:04] 10Operations, 10OCG-General, 10Patch-For-Review, 10Services (watching): Decommission OCG from production - https://phabricator.wikimedia.org/T177931#3676089 (10Joe) [15:51:14] 10Operations, 10hardware-requests: Decommission ocg1001-3 - https://phabricator.wikimedia.org/T177958#3676090 (10Joe) a:05Joe>03None [15:51:41] 10Operations, 10hardware-requests: Decommission ocg1001-3 - https://phabricator.wikimedia.org/T177958#3676074 (10Joe) [15:52:58] (03PS1) 10Ottomata: Add druid/public/worker.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/383602 [15:53:22] (03CR) 10Ottomata: [V: 032 C: 032] Add druid/public/worker.yaml [labs/private] - 10https://gerrit.wikimedia.org/r/383602 (owner: 10Ottomata) [15:54:59] (03PS5) 10Elukey: Removing from whitelist tables that no longer exist [puppet] - 10https://gerrit.wikimedia.org/r/383185 (https://phabricator.wikimedia.org/T171629) (owner: 10Nuria) [15:56:05] (03CR) 10Elukey: [C: 032] Removing from whitelist tables that no longer exist [puppet] - 10https://gerrit.wikimedia.org/r/383185 (https://phabricator.wikimedia.org/T171629) (owner: 10Nuria) [15:57:59] (03PS10) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [15:58:31] (03CR) 10jerkins-bot: [V: 04-1] Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [15:59:48] (03PS11) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [16:02:52] (03PS12) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [16:03:42] (03CR) 10Paladox: [C: 04-1] "> Paladox, any news?" [puppet] - 10https://gerrit.wikimedia.org/r/368196 (owner: 10Paladox) [16:05:25] (03PS1) 10Jgreen: add jgreen to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/383604 (https://phabricator.wikimedia.org/T177602) [16:06:42] (03CR) 10Jgreen: [C: 032] add jgreen to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/383604 (https://phabricator.wikimedia.org/T177602) (owner: 10Jgreen) [16:08:02] 10Operations, 10Pybal, 10Traffic: Upgrade LVS servers to stretch - https://phabricator.wikimedia.org/T177961#3676143 (10ema) [16:08:16] (03CR) 10Elukey: [C: 031] add jgreen to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/383604 (https://phabricator.wikimedia.org/T177602) (owner: 10Jgreen) [16:08:41] RECOVERY - Host lvs1009 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [16:08:46] (03CR) 10Andrew Bogott: [C: 031] hieradata: rollout check_smart on a subset of codfw hosts [puppet] - 10https://gerrit.wikimedia.org/r/383529 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [16:09:35] !log resuming PDU upgrade [16:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:45] !log decommission maps-test2004 from its cassandra cluster (free a node to test vector tiles) - T153282 [16:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:52] T153282: [epic] Migrate to a new vector tile structure - https://phabricator.wikimedia.org/T153282 [16:09:56] 10Operations, 10Pybal, 10Traffic: Upgrade LVS servers to stretch - https://phabricator.wikimedia.org/T177961#3676173 (10ema) p:05Triage>03Normal [16:11:38] (03CR) 10Filippo Giunchedi: [C: 031] "Apologies, this fell off my radar." [puppet] - 10https://gerrit.wikimedia.org/r/380942 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:11:42] ACKNOWLEDGEMENT - Check whether ferm is active by checking the default input chain on ftp-internal is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly Ayounsi working on pdu upgrades [16:11:47] (03PS3) 10Filippo Giunchedi: Thumbor: fix logstash message type [puppet] - 10https://gerrit.wikimedia.org/r/380942 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:12:02] PROBLEM - Host lvs1009 is DOWN: PING CRITICAL - Packet loss = 100% [16:12:07] !log Upgrading Jenkins CI T177962 [16:12:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:14] T177962: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962 [16:13:21] (03CR) 10Filippo Giunchedi: [C: 032] Thumbor: fix logstash message type [puppet] - 10https://gerrit.wikimedia.org/r/380942 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:15:13] !log roll-restart thumbor to apply https://gerrit.wikimedia.org/r/380942 - T150734 [16:15:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:21] T150734: Make Thumbor logs available in ELK - https://phabricator.wikimedia.org/T150734 [16:15:34] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676183 (10hashar) Installed on contint1001/contint2001 from http://pkg.jenkins-ci.org/debian-stable/binary/jenkins... [16:15:52] PROBLEM - Host ps1-a1-codfw is DOWN: PING CRITICAL - Packet loss = 100% [16:18:50] I'm working on those PDUs ^ [16:21:02] RECOVERY - Host ps1-a1-codfw is UP: PING OK - Packet loss = 0%, RTA = 37.54 ms [16:22:16] (03CR) 10Elukey: Set up separate druid public-eqiad cluster. (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [16:23:02] PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:36] (03PS1) 10Eevans: Add key material for new deployment-prep Cassandra nodes [labs/private] - 10https://gerrit.wikimedia.org/r/383608 [16:23:50] (03PS1) 10Cmjohnson: Adding mac address to dhcpd file and updating netboot.cfg file for db1107 and db1108 T177405 [puppet] - 10https://gerrit.wikimedia.org/r/383609 [16:24:25] (03CR) 10jerkins-bot: [V: 04-1] Adding mac address to dhcpd file and updating netboot.cfg file for db1107 and db1108 T177405 [puppet] - 10https://gerrit.wikimedia.org/r/383609 (owner: 10Cmjohnson) [16:24:31] seems like someone layed out some minefields [16:24:42] !log Upgrade jenkins Maven integration plugin to 3.0 - T177962 [16:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:24:48] T177962: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962 [16:25:35] (03CR) 10Gehel: Thumbor: don't rewrite host value in logstash messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:25:57] (03CR) 10Eevans: [C: 031] Add key material for new deployment-prep Cassandra nodes [labs/private] - 10https://gerrit.wikimedia.org/r/383608 (owner: 10Eevans) [16:26:23] (03PS7) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [16:27:26] 10Operations, 10Pybal, 10Traffic: Upgrade LVS servers to stretch - https://phabricator.wikimedia.org/T177961#3676143 (10BBlack) One significant thing to keep in mind is the interface naming changes. We'll be going from e.g. `eth[0-3]` to something like `eno[1-2], ens1f[0-1]`, and we'll have to work that int... [16:27:35] (03CR) 10Gilles: Thumbor: don't rewrite host value in logstash messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:27:43] (03PS15) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [16:27:50] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3676209 (10BBlack) Got arrow keys working in Ctrl-S (thanks @Fgiunchedi !) by re-setting the local terminal. There is no "HP Shared Memory Features" prompt in the current NIC firmwa... [16:28:16] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:29:36] 10Operations: Stretch installer "No kernel modules were found" error - https://phabricator.wikimedia.org/T177963#3676216 (10herron) [16:30:08] (03PS16) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [16:30:35] 10Operations, 10Ops-Access-Requests, 10Analytics, 10Patch-For-Review: analytics-privatedata-users access for Jeff Green - https://phabricator.wikimedia.org/T177602#3676231 (10Jgreen) 05Open>03Resolved We concluded analytics-privatedata-user makes sense, so I can use hive to come up with hourly hit coun... [16:30:39] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [16:33:55] (03PS2) 10Cmjohnson: Adding macs to dhcpd file & updated netboot.cfg db1107/1108 T177405 [puppet] - 10https://gerrit.wikimedia.org/r/383609 [16:34:52] (03CR) 10Cmjohnson: [C: 032] Adding macs to dhcpd file & updated netboot.cfg db1107/1108 T177405 [puppet] - 10https://gerrit.wikimedia.org/r/383609 (owner: 10Cmjohnson) [16:34:59] (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:35:31] PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:35:39] (03CR) 10Gehel: [C: 031] Thumbor: don't rewrite host value in logstash messages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:35:44] 10Operations, 10MediaWiki-General-or-Unknown: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#3676239 (10Krinkle) [16:35:52] (03CR) 10Gehel: [C: 032] maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) (owner: 10Gehel) [16:35:53] 10Operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-Platform-Team: Update limit.sh to support systemd-based cgroup management - https://phabricator.wikimedia.org/T136603#2340617 (10Krinkle) [16:36:11] (03PS8) 10Gehel: maps: isolate maps-test2004 to test vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) [16:39:12] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:39:31] PROBLEM - puppet last run on analytics1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:40:01] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:40:02] PROBLEM - puppet last run on notebook1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:40:34] ^ I just merged a puppet change that should be entirely unrelated, checking anyway... [16:41:04] gehel: Function Call, has_key(): expects the first argument to be a hash, got "" which is of type String at /etc/puppet/modules/admin/manifests/hashuser.pp:14:8 [16:41:17] not sure what have you merged [16:41:49] https://gerrit.wikimedia.org/r/#/c/383398/ [16:42:08] something only maps related, I don't see how that could conflict (but I have been suprised before...) [16:43:10] given the hosts more likely https://gerrit.wikimedia.org/r/#/c/383604/ [16:43:13] is related [16:43:33] Jeff_Green:^^^ in case is related [16:43:37] yep, looks like a missing coma... [16:43:46] oh yeah [16:43:48] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676158 (10hashar) p:05Triage>03High [16:43:59] oh, nice [16:44:19] did I break something? [16:44:59] oh i think i did, I forgot the trailing comma [16:45:10] yep [16:45:27] sorry :-( [16:45:30] didn't noticed it sigh [16:45:44] (03PS1) 10Gehel: amdin / users: fix missing coma [puppet] - 10https://gerrit.wikimedia.org/r/383611 [16:45:58] fix above, if anyone would like to review... [16:46:19] (03CR) 10Elukey: [C: 032] amdin / users: fix missing coma [puppet] - 10https://gerrit.wikimedia.org/r/383611 (owner: 10Gehel) [16:46:25] elukey: thanks! [16:46:33] thank you! [16:46:38] elukey: missing coma? :D [16:46:43] thanks/sorry/thanks [16:47:06] argh amdin in the commit msg [16:47:10] There should be a check for that... that's definitely something a computer will catch better than a human :) [16:47:18] it is late I and my brain didn't parse things [16:47:33] (03CR) 10Dzahn: [V: 032 C: 032] Add key material for new deployment-prep Cassandra nodes [labs/private] - 10https://gerrit.wikimedia.org/r/383608 (owner: 10Eevans) [16:47:43] interestingly, python's yamllint doesn't complain about the missing comma [16:48:13] now I'm wondering how it parses it... [16:48:53] all good [16:49:11] PROBLEM - Apache HTTP on mw2105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:49:12] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:49:17] I suspect that jgreen and the next user are considered one "string" that happens to be broken over a line ending [16:50:02] RECOVERY - Apache HTTP on mw2105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.122 second response time [16:50:08] seems like a good think for a yaml linter to warn about, though (the legit use-case would be wrapping long strings, but line-split string with two short halves should be a red flag?) [16:50:08] (03CR) 10Filippo Giunchedi: [C: 031] Thumbor: don't rewrite host value in logstash messages [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [16:50:12] bblack: shouldn't it then alarm since the user is not right or present in data.yaml? (iirc there is a check for typos in usernames) [16:50:33] (I mean if it was considered one string composed by two usernames) [16:50:42] maybe? [16:50:42] elukey: what was failing is admin::hashuser line 14 [16:50:43] $uinfo = $::admin::data['users'][$name] [16:50:46] if has_key($uinfo, 'gid') { [16:50:48] the if [16:50:59] https://gerrit.wikimedia.org/r/380943 would need a merge, though I have to go, if there's a volunteer to merge it (should be safe, but implies a logstash restart) otherwise I'll merge it on fri [16:51:00] has_key(): expects the first argument to be a hash, got "" [16:51:03] ahhhh [16:52:00] I'm off soon as well, but if anyone merges that logstash path, use cumin with appropriate class selector, or remember that we have new logstash servers (logstash100[7-9]) [16:53:02] RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:53:03] yeah it seems yaml parsers treat it as one string "jgreen bsitzmann" [16:53:09] where the proper selector would be 'A:logstash' [16:53:11] so the key lookup on that $name fails [16:54:07] !log demon@tin Pruned MediaWiki: 1.31.0-wmf.1 [keeping static files] (duration: 01m 44s) [16:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:18] for a generic yaml linter, probably the right heuristic warning would be something like "string is split over two lines, yet both halves are short (probably missing a comma!)" [16:59:01] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Incident: Collect Backend-Timing in Graphite (or Prometheus) - https://phabricator.wikimedia.org/T131894#3676319 (10Krinkle) [16:59:14] 10Operations, 10Ops-Access-Requests: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493#3676332 (10RobH) so most of ops team has the rights in the channel to do this, but i've gone ahead and done so basically ops op themselves, then run: ``` /pm... [16:59:22] 10Operations, 10Ops-Access-Requests: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493#3676333 (10RobH) 05Open>03Resolved a:03RobH [17:00:31] RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [17:06:39] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.16 (duration: 02m 40s) [17:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:04] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.17 (duration: 02m 20s) [17:09:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:25] 10Operations: Some Core availability Catchpoint tests might be more expensive than they need to be - https://phabricator.wikimedia.org/T162857#3676367 (10Krinkle) We created this task at a time we were considering to use Catchpoint for some of our web performance tests. At that time, a lot of credits were alread... [17:09:31] RECOVERY - puppet last run on analytics1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:10:01] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:10:02] RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:12:22] (03PS2) 10Cmjohnson: removing db1023/24 from site.pp for decomm [puppet] - 10https://gerrit.wikimedia.org/r/382495 [17:13:20] (03CR) 10Cmjohnson: [C: 032] removing db1023/24 from site.pp for decomm [puppet] - 10https://gerrit.wikimedia.org/r/382495 (owner: 10Cmjohnson) [17:13:42] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T166486" [puppet] - 10https://gerrit.wikimedia.org/r/382495 (owner: 10Cmjohnson) [17:14:03] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3676386 (10Dzahn) https://gerrit.wikimedia.org/r/382495 [17:14:53] 10Operations, 10Traffic: Network hardware purchasing for Asia Cache DC - https://phabricator.wikimedia.org/T162683#3676410 (10faidon) [17:21:21] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [17:21:29] 10Operations, 10DBA, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3676424 (10jrbs) Mind if I assign you @Reedy ? [17:23:12] (03PS1) 10Chad: scap prep: num_procs should be a string [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383615 [17:25:33] (03PS1) 10Chad: group0 to wmf.3 (plus missing symlink to wmf.2....) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383616 [17:25:41] (03CR) 10Thcipriani: [C: 031] scap prep: num_procs should be a string [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383615 (owner: 10Chad) [17:25:46] (03CR) 10Chad: [C: 032] scap prep: num_procs should be a string [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383615 (owner: 10Chad) [17:27:02] (03Merged) 10jenkins-bot: scap prep: num_procs should be a string [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383615 (owner: 10Chad) [17:27:11] (03CR) 10jenkins-bot: scap prep: num_procs should be a string [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383615 (owner: 10Chad) [17:28:28] (03PS1) 10Volans: Backends: catch optional backends import errors [software/cumin] - 10https://gerrit.wikimedia.org/r/383617 [17:28:35] (03CR) 10Chad: [C: 032] group0 to wmf.3 (plus missing symlink to wmf.2....) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383616 (owner: 10Chad) [17:29:44] (03Merged) 10jenkins-bot: group0 to wmf.3 (plus missing symlink to wmf.2....) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383616 (owner: 10Chad) [17:29:52] !log demon@tin Synchronized scap/plugins/prep.py: No-op (duration: 01m 49s) [17:29:57] (03CR) 10jenkins-bot: group0 to wmf.3 (plus missing symlink to wmf.2....) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383616 (owner: 10Chad) [17:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:31] (03CR) 10Gilles: "Have you restarted logstash yet?" [puppet] - 10https://gerrit.wikimedia.org/r/380942 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [17:32:09] !log demon@tin Synchronized php: symlink swap to wmf.2, somehow got missed (duration: 00m 45s) [17:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:44] (03CR) 10Muehlenhoff: "ocg has been removed from puppet, so this can now be dropped as well." [puppet] - 10https://gerrit.wikimedia.org/r/379486 (owner: 10Muehlenhoff) [17:32:53] (03PS2) 10Muehlenhoff: Remove Trebuchet puppet package provider [puppet] - 10https://gerrit.wikimedia.org/r/379486 [17:34:09] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3676522 (10ayounsi) [17:34:21] !log demon@tin Started scap: wmf.3 bootstrap [17:34:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:08] (03CR) 10Chad: "Please stop creating 2.15 changes. Time would be better spent making sure we're ready for 2.14." [puppet] - 10https://gerrit.wikimedia.org/r/382065 (owner: 10Paladox) [17:40:50] (03CR) 10Muehlenhoff: [C: 032] Remove Trebuchet puppet package provider [puppet] - 10https://gerrit.wikimedia.org/r/379486 (owner: 10Muehlenhoff) [17:41:18] <_joe_> moritzm: \o/ [17:41:29] (03PS1) 10Giuseppe Lavagetto: Rakefile: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/383620 [17:41:52] <_joe_> volans: ^^ [17:42:25] <_joe_> I'm sure everyone that reviewed the initial change didn't notice the inversion [17:42:41] lol [17:43:04] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/383620 (owner: 10Giuseppe Lavagetto) [17:44:47] (03Abandoned) 10Dzahn: otrs: apache resources in profile vs include [puppet] - 10https://gerrit.wikimedia.org/r/382338 (owner: 10Dzahn) [17:48:35] (03Abandoned) 10Paladox: Gerrit: Set nullNamePatternMatchesAll to true [puppet] - 10https://gerrit.wikimedia.org/r/382065 (owner: 10Paladox) [17:49:49] !log uploaded jenkins 2.73.2 for jessie-wikimedia to apt.wikimedia.org [17:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:53] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3676542 (10MoritzMuehlenhoff) I've uploaded 2.73.2 to apt.wikimedia.org [17:51:21] (03PS2) 10Giuseppe Lavagetto: Rakefile: brown paper bag fix [puppet] - 10https://gerrit.wikimedia.org/r/383620 [17:57:26] 10Operations: Stretch installer "No kernel modules were found" error - https://phabricator.wikimedia.org/T177963#3676549 (10herron) 05Open>03Resolved a:03herron @MoritzMuehlenhoff pointed out that there is an update script in /home/faidon on puppetmaster1001 which fetches current netboot files and copies t... [17:58:14] !log ran /home/faidon/update-netboot-stretch.sh on puppetmaster1001 [17:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T1800). Please do the needful. [18:00:04] Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:12] o/ [18:00:19] Hello, I can SWAT this evening. [18:00:35] hey, where’s the #bothumor? ;) [18:00:50] jouncebot: refresh [18:01:10] I refreshed my knowledge about deployments. [18:01:20] Lucas_WMDE: do you have a change to add? [18:01:34] (03CR) 10Dereckson: [C: 032] labs: Add $wgWBRepoSettings['canonicalUriProperty'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:01:39] no, sorry, was just wondering about jouncebot’s message [18:01:41] the bothumor is that it doesn't respect humans and therefore isnt here for our entertainment today :) [18:02:49] (03PS2) 10Dereckson: Remove deprecated config variable for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381642 (https://phabricator.wikimedia.org/T129475) (owner: 10Ladsgroup) [18:03:03] Lucas_WMDE: ok :) [18:03:10] 10Operations: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676576 (10EddieGP) [18:03:37] mutante: thanks for the merge; do you know if something needs to be done in deployment-prep to apply that? [18:04:12] mutante: i.e. it's still not working, and i assume i'm missing a step [18:05:49] urandom: i'm afraid that's because deployment-prep doesn't use the same puppetmaster but i don't know the details about it [18:05:53] (03PS2) 10Dereckson: labs: Add $wgWBRepoSettings['canonicalUriProperty'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:06:08] (03CR) 10Dereckson: [C: 032] "SWAT, take two" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:06:36] mutante: i can see the outdated git repo on the deployment-prep puppetmaster [18:07:00] eddiegp: that ticket should be linked to all other tickets suggesting to create MORE IRC channels, a good example that "another channel" and "another list" are usually not the best answer, just the first one [18:08:55] Yeah, especially mass of IRC channels floating around in wikimedia land is a horrible mess :) [18:09:06] (03CR) 10Dereckson: [C: 032] "SWAT, take three" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:09:09] urandom: sorry, i dont know about that. i have wondered before if it makes sense to have beta separate from prod but also use it for testing [18:09:48] 10Operations: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676592 (10Bawolff) Dropping irc channels is not really an operations thing. The channel founder would be the one who'd have to drop it. Normally, unwanted irc channels arent dropped so much as just abandoned. [18:11:13] 10Operations: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676576 (10Dzahn) This ticket should be linked to all other tickets that suggest _adding_ more channels as an example for why it might not be the best idea. Because the "another channel", just like "another list" and "an... [18:12:47] Amir1: ah, we've a node on Jenkins for hhvm tests now :) [18:13:10] nice, I hope that make things faster [18:13:15] at least a little [18:13:29] (03Merged) 10jenkins-bot: labs: Add $wgWBRepoSettings['canonicalUriProperty'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:13:32] it means we're going to have the change merged [18:13:32] here we are [18:13:44] 10Operations: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676602 (10EddieGP) I haven't found any "irc-managers" or something like that tag, so #operations seemd to fit best. I thought about "dropping" as making it forward it to another (more popular) channel that will be used... [18:13:45] (03CR) 10jenkins-bot: labs: Add $wgWBRepoSettings['canonicalUriProperty'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383376 (https://phabricator.wikimedia.org/T177857) (owner: 10Ladsgroup) [18:14:25] 10Operations, 10Analytics, 10Analytics-Cluster, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3676603 (10dr0ptp4kt) An update: we looked into this. In short, we got stuck at the point of drivers. It appears that there //may// be a way to pure OpenCL-orient... [18:14:55] 10Operations, 10wikimedia-irc-freenode: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676610 (10Legoktm) I added #wikimedia-irc-freenode , if the operator is inactive we can have the group contacts close and redirect the channel. [18:15:00] 18:14:48 sync-file failed: Failed to acquire lock "/var/lock/scap.unknown-but-probably-mediawiki.lock"; owner is "demon"; reason is "wmf.3 bootstrap" [18:15:17] no_justification: you're still deploying train? [18:15:29] Yes. [18:15:35] Swat cancelled, sorry [18:17:03] no_justification: meanwhile, I've merged a no-op change for Amir1, can you sync wmf-config/Wikibase-labs.php when you're done please? [18:17:49] I guess.... [18:17:55] thanks [18:18:39] 10Operations, 10wikimedia-irc-freenode: Drop #wikimedia-codereview channel - https://phabricator.wikimedia.org/T177974#3676614 (10Dzahn) >>! In T177974#3676602, @EddieGP wrote: > I haven't found any "irc-managers" or something like that tag, so #operations seemd to fit best. I don't think operations should be... [18:19:17] !log demon@tin Finished scap: wmf.3 bootstrap (duration: 44m 56s) [18:19:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:18] (03PS1) 10Volans: README: minor updates [software/cumin] - 10https://gerrit.wikimedia.org/r/383627 [18:23:23] (03CR) 10Volans: [C: 032] Backends: catch optional backends import errors [software/cumin] - 10https://gerrit.wikimedia.org/r/383617 (owner: 10Volans) [18:23:24] !log demon@tin Synchronized wmf-config/Wikibase-labs.php: no-op, labs only (duration: 00m 59s) [18:23:28] Dereckson: ^ [18:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:54] mutante: JFTR, I'm not using #operations as a random fallback. But phab did not show up anything promising when typing in "irc" in the tag field and I saw that #operations worked on T177493 ;-) [18:23:54] T177493: IRC operator request for Freenode #wikimedia-operations for @Dereckson - https://phabricator.wikimedia.org/T177493 [18:24:35] no_justification: thanks [18:24:52] Amir1: so for your other change, can you reschedule it? [18:25:02] (03PS1) 10Cmjohnson: Deleting mgmt entries for ms-be1001-12 & ms-fe1001-4 [dns] - 10https://gerrit.wikimedia.org/r/383628 [18:25:41] okay, let me find a time [18:27:33] (03Merged) 10jenkins-bot: Backends: catch optional backends import errors [software/cumin] - 10https://gerrit.wikimedia.org/r/383617 (owner: 10Volans) [18:27:45] Dereckson: added [18:27:57] Amir1: thanks :) [18:28:36] Actually, you've got time [18:28:42] Seems quick enough [18:28:43] Go ahead [18:29:10] (03CR) 10Volans: [C: 032] README: minor updates [software/cumin] - 10https://gerrit.wikimedia.org/r/383627 (owner: 10Volans) [18:29:20] 10Operations, 10DBA, 10Support-and-Safety, 10Patch-For-Review, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3559677 (10Dereckson) @jrbs You can let it unassigned, so you've more opportunity for a swift deployment. [18:29:35] eddiegp: no worries at all. thanks. i just think that one makes the most sense to go back to the same people who created it [18:29:43] eddiegp: it depends which channel it is about i guess.. [18:32:07] I don't like randomly cc'ing people just becasue they had commented on something related years back (which is why I removed all the @'s from the quoted comments), but cc'ing the channel creator would have been fine I guess. And you're absolutely right, randomly cc'ing tags is no better than randomly cc'ing people :) [18:32:48] (03Merged) 10jenkins-bot: README: minor updates [software/cumin] - 10https://gerrit.wikimedia.org/r/383627 (owner: 10Volans) [18:33:15] eddiegp: i don't think it's random in this case. people asked to create a channel, now it's about removing the channel, seems normal to me to ask the ones who created it [18:33:58] (03CR) 10Jdlrobson: "After https://gerrit.wikimedia.org/r/383487 this should be uncontroversial. I'd like to SWAT this later today @krinkle - are you able to r" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [18:34:15] also maybe it makes people reconsider if "add another channel" is really the solution next time [18:34:44] * bawolff wouldn't bet on it [18:35:06] The correct answer to "Add another X" is always no, but the path taken always ends up being yes [18:35:27] well, me neither but at least they could see that there is now extra work removing it again [18:35:40] mutante: Yeah, you're right. Just wanted to say why I didn't do so in the first place :) [18:35:51] eddiegp: it's all cool:) thank you for helping [18:36:52] :) [18:40:05] (03CR) 10Hashar: [C: 031] "I have nuked /mnt from both puppet compiler instance. So we can merge this one now :)" [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) (owner: 10Hashar) [18:41:07] (03PS1) 10Chad: group1 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383631 [18:42:30] (03PS1) 10Eevans: Add new deployment-prep Cassandra hosts [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/383632 [18:42:58] (03CR) 10Eevans: [V: 032 C: 032] Add new deployment-prep Cassandra hosts [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/383632 (owner: 10Eevans) [18:46:20] (03CR) 10Rush: "http://puppet-compiler.wmflabs.org/8290/" [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [18:46:31] (03PS17) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [18:47:04] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [18:47:17] !log killing stuck osm replication on maps-test2001 [18:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:11] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3676816 (10ayounsi) [18:53:27] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3590787 (10ayounsi) All done here. [18:53:36] 10Operations, 10DC-Ops: Review and fix PDU settings for syslog/ntp/email servers - https://phabricator.wikimedia.org/T175341#3676830 (10ayounsi) 05Open>03Resolved a:03ayounsi [18:54:47] (03PS1) 10Ayounsi: Revert "Add ftp-internal to puppet" [puppet] - 10https://gerrit.wikimedia.org/r/383633 [18:54:57] (03PS2) 10Ayounsi: Revert "Add ftp-internal to puppet" [puppet] - 10https://gerrit.wikimedia.org/r/383633 [18:55:16] (03PS1) 10Ayounsi: Revert "Add partman receipe for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383636 [18:55:41] (03CR) 10Ayounsi: [C: 032] Revert "Add ftp-internal to puppet" [puppet] - 10https://gerrit.wikimedia.org/r/383633 (owner: 10Ayounsi) [18:56:14] (03PS3) 10Ayounsi: Revert "Add partman receipe for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383636 [18:56:53] (03CR) 10Ayounsi: [C: 032] Revert "Add partman receipe for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383636 (owner: 10Ayounsi) [18:57:08] (03PS1) 10Ayounsi: Revert "Add DHCP entry for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383639 [18:57:21] (03CR) 10Chad: [C: 032] group1 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383631 (owner: 10Chad) [18:57:26] (03PS2) 10Ayounsi: Revert "Add DHCP entry for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383639 [18:58:05] (03CR) 10Ayounsi: [C: 032] Revert "Add DHCP entry for ftp-internal" [puppet] - 10https://gerrit.wikimedia.org/r/383639 (owner: 10Ayounsi) [19:00:04] no_justification: Time to snap out of that daydream and deploy MediaWiki train. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T1900). [19:00:05] No GERRIT patches in the queue for this window AFAICS. [19:00:17] (03PS1) 10Ayounsi: Revert "Add DNS/IP allocations for ftp-internal" [dns] - 10https://gerrit.wikimedia.org/r/383640 [19:00:22] (03PS2) 10Ayounsi: Revert "Add DNS/IP allocations for ftp-internal" [dns] - 10https://gerrit.wikimedia.org/r/383640 [19:00:39] Heh, snap out of real dreams more like it #sleepdebt [19:00:48] (03Merged) 10jenkins-bot: group1 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383631 (owner: 10Chad) [19:00:52] (03CR) 10Ayounsi: [C: 032] Revert "Add DNS/IP allocations for ftp-internal" [dns] - 10https://gerrit.wikimedia.org/r/383640 (owner: 10Ayounsi) [19:00:59] (03CR) 10jenkins-bot: group1 to wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383631 (owner: 10Chad) [19:01:51] PROBLEM - Host ftp-internal is DOWN: PING CRITICAL - Packet loss = 100% [19:02:44] !log demon@tin Synchronized php: symlink swap (duration: 00m 49s) [19:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:15] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.3 [19:03:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:41] (03PS5) 10Dzahn: Migrate puppet compiler instance from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) (owner: 10Hashar) [19:08:16] mutante: I am ready :) [19:08:39] (03CR) 10Dzahn: [C: 032] Migrate puppet compiler instance from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) (owner: 10Hashar) [19:09:13] (03Abandoned) 10Chad: Stop forcing php5 in `mwscript` [puppet] - 10https://gerrit.wikimedia.org/r/358896 (https://phabricator.wikimedia.org/T146285) (owner: 10Chad) [19:12:51] hashar: thanks :) done [19:14:58] hashar: re: jenkins on releases. i have 2.46.2 but that's latest [19:16:18] !log releases1001/releases2001 - apt-get autoremove to drop non-required python packages [19:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:52] mutante it seems it's because jenkins 2.73.1 was uploaded to jessie [19:16:55] but wasen't to stretch [19:17:10] [18:49:49] !log uploaded jenkins 2.73.2 for jessie-wikimedia to apt.wikimedia.org [19:18:05] paladox: seems right, yea [19:18:11] yep [19:20:53] !log apt: reprepro copy stretch-wikimedia jessie-wikimedia jenkins [19:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:10] mutante: the puppet compiler seems to work fine! Thanks! [19:22:30] so i copied the jenkins package from jessie to stretcch [19:22:38] and now installing it [19:22:48] currently looking at the diff between package and installed config [19:23:01] will puppet write the config or the package [19:23:30] hashar: :) great [19:23:37] releases hosts are using stretch? [19:23:41] PROBLEM - DPKG on releases2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:23:51] PROBLEM - jenkins_service_running on releases2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [19:23:57] (03PS18) 10Rush: openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) [19:24:01] hashar: of course, it's stable *gg* [19:24:30] (03CR) 10jerkins-bot: [V: 04-1] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [19:25:49] ACKNOWLEDGEMENT - DPKG on releases2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages daniel_zahn upgrade in progress [19:25:49] ACKNOWLEDGEMENT - jenkins_service_running on releases2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war daniel_zahn upgrade in progress [19:27:41] RECOVERY - DPKG on releases2001 is OK: All packages OK [19:27:51] RECOVERY - jenkins_service_running on releases2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [19:27:51] !log releases2001 - upgraded jenkins to 2.73.2, kept existing config (vs overwriting with package config) [19:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:57] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3677098 (10hashar) 05Open>03Resolved 21:27:51 <@Dzahn> !log releases2001 - upgraded jenkins to 2.73.2, kept exi... [19:29:06] mutante: and that closes https://phabricator.wikimedia.org/T177962 :) Thanks! [19:29:24] !log releases1001 - same as 2001, upgraded jenkins to 2.73.2, kept existing config (T177962) [19:29:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:33] T177962: Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962 [19:30:02] hashar: now :) de rien [19:31:40] and now we also do the new kernel... [19:38:08] (03PS26) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 (https://phabricator.wikimedia.org/T157414) [19:40:33] (03PS1) 10Jforrester: Parser: Switch from Tidy to Remex on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383652 (https://phabricator.wikimedia.org/T176150) [19:40:35] (03PS1) 10Jforrester: Parser: Switch from Tidy to Remex on nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383653 (https://phabricator.wikimedia.org/T177989) [19:47:46] (03CR) 10Rush: [V: 031] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [19:47:56] (03CR) 10Rush: [V: 031 C: 031] openstack: pdns auth module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [19:52:18] going to deploy a tiny change to the jobrunners https://gerrit.wikimedia.org/r/#/c/359923/ [19:53:27] (03CR) 10Ottomata: Set up separate druid public-eqiad cluster. (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [19:53:45] (03PS13) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [19:57:24] (03PS1) 10Dzahn: base/standard_packages: add dnsutils [puppet] - 10https://gerrit.wikimedia.org/r/383657 [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:22] kik [20:00:24] *lol [20:00:29] no ORES [20:01:22] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:01:43] !log disable puppet around cloud things to roll refactor out slowly [20:01:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:07] (03CR) 10Rush: [V: 032 C: 032] "We don't want to refactor the mariadb role inline with this refactor so forced to take the -1 for style change atm." [puppet] - 10https://gerrit.wikimedia.org/r/381295 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [20:04:25] !log hashar@tin Started restart [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 [20:04:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:34] T168044: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044 [20:04:43] limited to mw1299 [20:05:30] !log bsitzmann@tin Started deploy [mobileapps/deploy@15a52ba]: Update mobileapps to c045afc (T177301) [20:05:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:38] T177301: Update MCS transforms to account for instead of for inline figures - https://phabricator.wikimedia.org/T177301 [20:07:21] PROBLEM - puppet last run on labtestservices2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 seconds ago with 1 failures. Failed resources (up to 3 shown): Service[mysql] [20:07:57] !log hashar@tin Started restart [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 [20:08:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:40] !log ssastry@tin Started deploy [parsoid/deploy@938e305]: Updating Parsoid to ddf7b293 [20:09:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:54] 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Release-Engineering-Team (Kanban): Upgrade Jenkins to 2.73.2 (security release) - https://phabricator.wikimedia.org/T177962#3677334 (10Dzahn) 19:20 mutante: apt: reprepro copy stretch-wikimedia jessie-wikimedia jenkins [20:10:55] !log bsitzmann@tin Finished deploy [mobileapps/deploy@15a52ba]: Update mobileapps to c045afc (T177301) (duration: 05m 25s) [20:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:04] T177301: Update MCS transforms to account for instead of for inline figures - https://phabricator.wikimedia.org/T177301 [20:11:43] !log hashar@tin Started restart [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 [20:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:51] T168044: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044 [20:13:28] !log hashar@tin Started deploy [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 [20:13:30] !log hashar@tin Finished deploy [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 (duration: 00m 02s) [20:13:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:09] !log hashar@tin Started deploy [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 [20:14:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:05] (03PS14) 10Ottomata: Set up separate druid public-eqiad cluster. [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) [20:17:03] !log hashar@tin Finished deploy [jobrunner/jobrunner@a20d043]: Services now exit(0) when catching a signal - T168044 (duration: 02m 54s) [20:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:12] T168044: jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044 [20:17:16] (03CR) 10Ottomata: "Luca, feel free to amend and merge this when you get back online Thursday morning. I'm reluctant to do so now since it is almost the end " [puppet] - 10https://gerrit.wikimedia.org/r/380804 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [20:18:22] !log ssastry@tin Finished deploy [parsoid/deploy@938e305]: Updating Parsoid to ddf7b293 (duration: 08m 42s) [20:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:48] 10Operations, 10JobRunner-Service, 10Beta-Cluster-reproducible, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3677405 (10hashar) 05stalled>03Resolved Solved. All kudos/credits... [20:21:22] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:22:05] !log updated Parsoid to ddf7b293 (T138492, T135667, T177612) [20:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:16] T135667: Please put templates on separate lines (when it won't break things) and don't (when it will) - https://phabricator.wikimedia.org/T135667 [20:22:16] T177612: Transclusion data-mw attributes get munged together - https://phabricator.wikimedia.org/T177612 [20:22:16] T138492: Add support for actual format strings to TemplateData's "format" parameter - https://phabricator.wikimedia.org/T138492 [20:24:31] (03PS1) 10Volans: CHANGELOG: add changelogs for release v1.2.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/383663 [20:27:17] (03CR) 10Volans: [C: 032] CHANGELOG: add changelogs for release v1.2.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/383663 (owner: 10Volans) [20:27:56] gehel: ebernhar|lunch not sure who to ping directly but I think possibly a change from maps-test is giving icinga an issue atm it's failing to reload for any new config w: [20:27:57] Error: Could not find any hostgroup matching 'maps-test-vectortiles_codfw' (config file '/etc/icinga/puppet_hosts.cfg', starting on line 20025) [20:30:12] chasemp: agree, it's I004790c3528cf4e3a5a23329eed5df67358739fd [20:31:36] I'm about to hop into a meeting volans, I'll make a ticket post if needed, short on time [20:32:18] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v1.2.2 [software/cumin] - 10https://gerrit.wikimedia.org/r/383663 (owner: 10Volans) [20:32:36] modules/monitoring/manifests/group.pp [20:32:52] that's what creates the missing group [20:32:56] mutante: adding it to hieradata/common/monitoring.yaml should be enough [20:33:08] ah, that's where they are, cool, yea [20:33:30] if you could do a patch will be great, I'm in a middle of a cumin release :D [20:33:39] ok, doing it [20:33:43] thanks! [20:33:51] PROBLEM - puppet last run on labtestservices2002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/powerdns/pdns.d/],Service[mysql] [20:33:51] there is a maps section there [20:37:06] (03PS1) 10Dzahn: maps/icinga: add missing hostgroup maps-test-vectortiles_codfw [puppet] - 10https://gerrit.wikimedia.org/r/383666 [20:39:17] mutante: nitpick, vectortiles->vector tiles :D [20:39:34] and a newline before # MediaWiki :-P [20:39:39] ofc it's ok as it is too ;) [20:43:47] (03PS2) 10Dzahn: maps/icinga: add missing hostgroup maps-test-vectortiles_codfw [puppet] - 10https://gerrit.wikimedia.org/r/383666 [20:44:47] (03CR) 10Dzahn: [C: 032] maps/icinga: add missing hostgroup maps-test-vectortiles_codfw [puppet] - 10https://gerrit.wikimedia.org/r/383666 (owner: 10Dzahn) [20:46:06] (03CR) 10Dzahn: "follow-up: https://gerrit.wikimedia.org/r/#/c/383666/" [puppet] - 10https://gerrit.wikimedia.org/r/383398 (https://phabricator.wikimedia.org/T153282) (owner: 10Gehel) [20:46:26] mutante, volans, chasemp: thanks a lot for the fix! My mistake... [20:47:22] yw, no problem gehel ;) [20:48:45] np! i'm running puppet on einsteinium.. takes a while [20:59:16] (03CR) 10Gehel: Backends: catch optional backends import errors (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/383617 (owner: 10Volans) [21:00:07] (03PS1) 10Dzahn: cache::misc/releases: tmp disable codfw backend [puppet] - 10https://gerrit.wikimedia.org/r/383683 [21:00:07] Info: Monitoring::Group[maps-test-vectortiles_codfw]: Scheduling refresh of Service[icinga] [21:01:02] Total Errors: 0 [21:01:07] Things look okay - No serious problems were detected during the pre-flight check [21:01:19] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [21:01:21] chasemp: ^ it's fixed [21:03:06] (03PS2) 10Dzahn: cache::misc/releases: tmp disable codfw backend [puppet] - 10https://gerrit.wikimedia.org/r/383683 [21:04:33] (03CR) 10Dzahn: [C: 032] cache::misc/releases: tmp disable codfw backend [puppet] - 10https://gerrit.wikimedia.org/r/383683 (owner: 10Dzahn) [21:05:24] (03CR) 10Volans: "The intent is that you can install cumin without getting all the dependencies required by the openstack backend and work nicely as it is w" [software/cumin] - 10https://gerrit.wikimedia.org/r/383617 (owner: 10Volans) [21:05:55] gehel: happy to explain it better if it's not clear, and thanks for the comment, it made me realize another small mistake ;) ^^^ [21:07:36] !log running puppet on cache misc hosts to temp remove codfw backend of releases.wm.org, then rebooting releases2001 for kernel upgrade [21:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:19] PROBLEM - mysqld processes on labtestservices2002 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:11:19] (03PS1) 10Dereckson: Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T124841) [21:12:10] maybe labtestservices2002 needs to disable paging [21:13:12] test + codfw... why is it paging in the first place [21:13:39] PROBLEM - mysqld processes on labtestservices2003 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:13:49] !log releases2001 - scheduled downtime, rebooting [21:13:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:41] (03CR) 10Krinkle: "No longer a problem, so long it rolls out *after* https://gerrit.wikimedia.org/r/#/c/383645/ is backported and deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [21:17:24] it's paging because the monitor check for mariadb processes has "$is_critical = true" by default (mariadb::monitor_process) [21:17:49] so we can disable the notifications for the host, using the new hiera setting for it [21:18:18] or we can add some "if $fdqn starts with test , then is_critical => false" [21:19:00] or we can do it in role/common/whatever_role labtestservics uses [21:19:51] the "is_critical" in this context means paging on/off [21:20:00] doesnt change that it is CRIT for icinga [21:20:14] so still shows in channel and that [21:20:19] +1 for "disable the notifications for the host" to start with it's a test host, shoudn't be in prod at all IMHO [21:20:58] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:21:01] i once suggested a change like that that used regex.yaml in hiera to exclude all things in test [21:21:17] hostnames that start with test- in general [21:21:46] but that was in context of totally skipping all icinga monitoring, not just paging on/off, and that part didnt have consensus [21:23:58] jynus: sorry about that, I'm handling in icinga now it snuck in on me w/ the delayed icinga config application [21:24:02] let's do a regex, there is more than one labtest* [21:24:20] we would start creating one file for each otherwies [21:26:32] we already have some regexes for labtest things, using that [21:26:48] those are all going away [21:27:01] the existing ones anyhow [21:27:35] 124 __regex: !ruby/regexp /^labtest(net200[1-9]\.codfw\.wmnet|(services|control)100[1-9]\.wikimedia\.org)$/ [21:27:49] chasemp: would that still cover them? [21:28:11] missing virt but that's not a big deal [21:28:14] that is what sets "cluster: labtestvirt" [21:28:37] so it can also set "no notifications from icinga"? [21:29:18] (03PS2) 10Greg Grossmeier: Revert "Revert "Limit thanks for new users at pl.wikipedia to 3 per day"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383694 (https://phabricator.wikimedia.org/T169268) (owner: 10Dereckson) [21:29:49] does that kill irc as well? [21:30:44] yes, i think it does [21:30:59] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:31:00] there is another way to avoid that but ensure it doesnt send pages [21:31:07] but that would need puppet changes and not just hiera [21:31:36] it's currently just "if a mysql proc isnt running that means send a page" [21:31:52] so if something uses a role with that, it would be on by default [21:32:27] mutante: can you make a task quick with what you think and assign to me? I'll look at what we are doing and propose something [21:32:30] this is the stuff for that meeting about paging [21:32:32] i suppose [21:32:48] I want to keep irc but maybe we can move what channel is used or jsut set critial = false for certain roles [21:33:00] chasemp: what is the goal? :) [21:33:16] not send sms pages for labtest* things to non-cloud folks [21:33:17] then i can say what we can do technically to achieve that [21:33:40] the addition of "and based on contact groups" makes it not possible currently afaict [21:34:06] we can axe that part then for the time being and make it non-paging [21:34:18] but my guess is this is not hte last time we sort this out so I want to work through it on a task [21:34:23] for the next few times etc [21:34:52] yea, then just handle it via Icinga downtimes for now [21:35:23] should be done now [21:35:44] the only simple thing would be to also kill IRC messages, i think [21:37:07] (03PS1) 10Volans: Upstream release v1.2.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/383707 [21:39:11] (03PS1) 10RobH: decom of cp4005-8,13-16 [puppet] - 10https://gerrit.wikimedia.org/r/383709 (https://phabricator.wikimedia.org/T176366) [21:39:56] (03CR) 10RobH: [C: 032] decom of cp4005-8,13-16 [puppet] - 10https://gerrit.wikimedia.org/r/383709 (https://phabricator.wikimedia.org/T176366) (owner: 10RobH) [21:42:43] (03PS1) 10Rush: openstack: remove pdns auth things for labtestservices200[23] [puppet] - 10https://gerrit.wikimedia.org/r/383710 (https://phabricator.wikimedia.org/T171494) [21:43:17] (03PS2) 10Rush: openstack: remove pdns auth things for labtestservices200[23] [puppet] - 10https://gerrit.wikimedia.org/r/383710 (https://phabricator.wikimedia.org/T171494) [21:44:35] !log cp4009 shutdown accidental, will boot back up immediately [21:44:43] i got a bit too crazy on my shutdowns. [21:44:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:44] (03CR) 10Rush: [C: 032] openstack: remove pdns auth things for labtestservices200[23] [puppet] - 10https://gerrit.wikimedia.org/r/383710 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [21:46:47] (03PS1) 10Dzahn: mariadb/icinga: quick fix, if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 [21:47:04] (03PS2) 10Krinkle: remove unused injectrecentChanges option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 (owner: 10Thiemo Mättig (WMDE)) [21:47:08] PROBLEM - Host cp4009 is DOWN: PING CRITICAL - Packet loss = 100% [21:47:28] RECOVERY - puppet last run on labtestservices2003 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:48:32] cp4009 is coming back up. [21:48:41] (03CR) 10Krinkle: [C: 04-1] "80efc4672c274c5 suggests it still works." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 (owner: 10Thiemo Mättig (WMDE)) [21:49:09] RECOVERY - Host cp4009 is UP: PING WARNING - Packet loss = 86%, RTA = 78.53 ms [21:49:59] (03CR) 10Volans: [C: 032] Upstream release v1.2.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/383707 (owner: 10Volans) [21:51:09] 10Operations, 10Research: Request public key change for a research fellow - https://phabricator.wikimedia.org/T177889#3673946 (10leila) [21:51:23] 10Operations, 10Research: Request public key change for a research fellow - https://phabricator.wikimedia.org/T177889#3673946 (10leila) [21:51:46] 10Operations, 10Research: Request public key change for a research fellow - https://phabricator.wikimedia.org/T177889#3673946 (10leila) a:05leila>03None [21:52:44] (03Merged) 10jenkins-bot: Upstream release v1.2.2 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/383707 (owner: 10Volans) [21:53:49] RECOVERY - puppet last run on labtestservices2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:54:00] 10Operations, 10monitoring: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677751 (10Dzahn) [21:54:36] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests, 10Patch-For-Review: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3677764 (10RobH) xe-2/0/3 up up cp4005 xe-2/0/4 up up cp4006 xe-2/0/5 up up cp4007 xe-2/0/6 up up cp400... [21:54:58] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests, 10Patch-For-Review: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3677765 (10RobH) [21:55:03] 10Operations, 10monitoring: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677766 (10Dzahn) a:03chasemp [21:55:17] 10Operations, 10monitoring: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677767 (10chasemp) p:05Triage>03Normal [21:55:32] 10Operations, 10monitoring: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677751 (10chasemp) thanks @Dzahn [21:55:34] (03PS2) 10Dzahn: mariadb/icinga: if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 [21:56:00] (03CR) 10jerkins-bot: [V: 04-1] mariadb/icinga: if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 (owner: 10Dzahn) [22:01:47] !log uploaded cumin_1.2.2-1_amd64.deb to apt.wikimedia.org jessie-wikimedia [22:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:34] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: Review and fix file handle management in worker and celery processes - https://phabricator.wikimedia.org/T174402#3677795 (10awight) @akosiaris We're now estimating that 480 workers will eventually use 220k–1.2M... [22:05:56] (03PS3) 10Dzahn: mariadb/icinga: if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) [22:08:12] (03CR) 10Dzahn: "this is the quick fix and the example to show what it is about, should probably be done in a nicer way involving Hiera, so updates to the " [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) (owner: 10Dzahn) [22:18:13] 10Operations, 10TechCom-RfC, 10RfC, 10Services (attic), and 2 others: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825#3677831 (10GWicke) a:05GWicke>03mobrovac [22:19:12] (03PS4) 10Dzahn: mariadb/icinga: if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) [22:23:19] 10Operations, 10ops-ulsfo, 10Traffic: cp4026 memory error - https://phabricator.wikimedia.org/T178011#3677852 (10RobH) [22:26:25] 10Operations, 10MediaWiki-Platform-Team, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals - https://phabricator.wikimedia.org/T175213#3677872 (10GWicke) a:05GWicke>03None [22:26:47] 10Operations, 10MediaWiki-Platform-Team, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support - https://phabricator.wikimedia.org/T175206#3677874 (10GWicke) a:05GWicke>03None [22:29:50] 10Operations, 10RESTBase-API, 10Traffic, 10Patch-For-Review: [feature request] Redirect root API path to docs page - https://phabricator.wikimedia.org/T125226#3677898 (10GWicke) a:05GWicke>03None [22:30:57] 10Operations, 10ops-ulsfo, 10Traffic: cp4026 memory error - https://phabricator.wikimedia.org/T178011#3677901 (10RobH) [22:35:44] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3677915 (10RobH) [22:37:21] (03PS1) 10RobH: decom of cp4005-8,13-16 [dns] - 10https://gerrit.wikimedia.org/r/383721 (https://phabricator.wikimedia.org/T176366) [22:37:54] (03CR) 10RobH: [C: 032] decom of cp4005-8,13-16 [dns] - 10https://gerrit.wikimedia.org/r/383721 (https://phabricator.wikimedia.org/T176366) (owner: 10RobH) [22:38:32] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests, 10Patch-For-Review: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3677935 (10RobH) [22:39:50] 10Operations, 10ops-ulsfo, 10Traffic: cp4026 memory error - https://phabricator.wikimedia.org/T178011#3677938 (10BBlack) This will self-depool if you do a clean shutdown from software. We just need to verify + repool manually afterwards. [22:44:44] (03PS5) 10Dzahn: mariadb/icinga: if fqdn like labtest, don't page [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) [22:44:47] (03PS1) 10Dzahn: cache::misc/releases: switch backend to codfw [puppet] - 10https://gerrit.wikimedia.org/r/383722 [22:46:22] (03PS2) 10Dzahn: cache::misc/releases: switch backend to codfw [puppet] - 10https://gerrit.wikimedia.org/r/383722 [22:47:49] (03CR) 10BBlack: [C: 04-1] cache::misc/releases: switch backend to codfw [puppet] - 10https://gerrit.wikimedia.org/r/383722 (owner: 10Dzahn) [22:48:52] it always needs to have eqiad ? [22:48:59] i did that before, didnt i, heh [22:49:19] mutante: so you had active/active before, and you went eqiad-only earlier to work on codfw [22:49:39] right? and now you've got a working one in codfw, and you want to take down eqiad for work? [22:49:45] yes, correct [22:49:58] and as last step go back to active/active [22:50:03] so, basically, flip it back to active/active first and deploy that on the cache_misc's [22:50:09] then pull eqiad out [22:50:18] (then put it back when you're done) [22:50:49] 10Operations, 10Analytics-Kanban, 10Discovery, 10Discovery-Analysis (Current work), and 2 others: Can't install R package Boom (& bsts) on stat1002 (but can on stat1003) - https://phabricator.wikimedia.org/T147682#3677987 (10debt) [22:50:52] if do a direct one-step switch from all-codfw -> all-eqiad (or vice-versa), it will create temporary routing loops as the puppet patch takes effect asynchronously on the cluster [22:51:19] (which will be prevented from turning into infinite loops by our loop-protection VCL, instead returning 5xx errors to users until the window of problem is over) [22:51:27] bblack: .. and now i remember this from before, heh, yes :) thanks [22:51:45] sorry, I know it's not intuitive the way it works. Someday! :) [22:51:52] thanks for catching it [22:52:29] the first instinct was actually to hit the revert button on gerrit [22:52:42] but then i was "no, wait.. you need to do eqiad anyways", hah [22:53:03] (03PS1) 10Dzahn: Revert "cache::misc/releases: tmp disable codfw backend" [puppet] - 10https://gerrit.wikimedia.org/r/383723 [22:53:23] tries to memorize the "just use reverts" part [22:54:35] (03PS2) 10Dzahn: Revert "cache::misc/releases: tmp disable codfw backend" [puppet] - 10https://gerrit.wikimedia.org/r/383723 [22:54:48] (03Abandoned) 10Dzahn: cache::misc/releases: switch backend to codfw [puppet] - 10https://gerrit.wikimedia.org/r/383722 (owner: 10Dzahn) [22:55:36] (03CR) 10Dzahn: [C: 032] "making it active-active again after codfw is back up" [puppet] - 10https://gerrit.wikimedia.org/r/383723 (owner: 10Dzahn) [22:58:54] runs puppet on misc cache via personal bash alias for the cumin command [22:59:24] :) [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171011T2300). [23:00:05] James_F and Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:12] here [23:00:37] Hello! [23:03:04] I can SWAT [23:05:27] (03PS1) 10Dzahn: cache::misc/releases: tmp disable eqiad backend [puppet] - 10https://gerrit.wikimedia.org/r/383727 [23:06:23] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383652 (https://phabricator.wikimedia.org/T176150) (owner: 10Jforrester) [23:06:50] thcipriani: hey just heads up this is the "third time lucky" for the patch i tried to swat yesterday :) [23:07:02] pretty confident this time though that we'll see no log errors :) [23:07:15] jdlrobson: heh, I noticed, thank you for the heads-up :) [23:10:44] (03CR) 10Dzahn: [C: 032] "puppet definitely ran on cache::misc, so was back to active:active" [puppet] - 10https://gerrit.wikimedia.org/r/383727 (owner: 10Dzahn) [23:12:36] does the whole dance for the perfect uptime stats on releases.wm.org :p.. actual reboot is so quick.. :) [23:12:53] wont do it (and cant) for the netmon tools, but that's not public facing [23:13:22] James_F: fix unblocking autoblocks is live for wmf.{2,3} on mwdebug1002, check please [23:15:35] thcipriani: Tested, LGTM. [23:15:44] ok, going live wmf.3 first [23:17:47] !log thcipriani@tin Synchronized php-1.31.0-wmf.3/includes/specials/SpecialUnblock.php: SWAT: [[gerrit:383634|Fix unblocking autoblocks]] T177952 (duration: 00m 51s) [23:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:57] T177952: Can not unblock / remove autoblocks - https://phabricator.wikimedia.org/T177952 [23:18:18] !log releases1001 upgrading kernel, installed linux-image-amd64, took out of cache::misc, upgrade libnss3, scheduled downtime, rebooting ... [23:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:06] (03Merged) 10jenkins-bot: Parser: Switch from Tidy to Remex on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383652 (https://phabricator.wikimedia.org/T176150) (owner: 10Jforrester) [23:19:07] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/includes/specials/SpecialUnblock.php: SWAT: [[gerrit:383635|Fix unblocking autoblocks]] T177952 (duration: 00m 50s) [23:19:14] (03CR) 10jenkins-bot: Parser: Switch from Tidy to Remex on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383652 (https://phabricator.wikimedia.org/T176150) (owner: 10Jforrester) [23:19:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:39] (03PS1) 10Dzahn: Revert "cache::misc/releases: tmp disable eqiad backend" [puppet] - 10https://gerrit.wikimedia.org/r/383729 [23:20:46] James_F: Fix WikiEditor mode switcher widget is live on mwdebug1002, check please [23:21:29] thcipriani: Yup, LGTM. [23:21:35] going live [23:21:37] (03CR) 10Dzahn: [C: 032] "back to active active after both are upgraded" [puppet] - 10https://gerrit.wikimedia.org/r/383729 (owner: 10Dzahn) [23:23:03] 10Operations, 10ops-ulsfo, 10Traffic, 10hardware-requests, 10Patch-For-Review: Decom cp4005-8,13-16 (8 nodes) - https://phabricator.wikimedia.org/T176366#3678113 (10RobH) [23:23:47] !log thcipriani@tin Synchronized php-1.31.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/styles/ve.init.MWVESwitchConfirmDialog.css: SWAT: [[gerrit:383618|Fix WikiEditor mode switcher widget]] (duration: 00m 50s) [23:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:16] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383653 (https://phabricator.wikimedia.org/T177989) (owner: 10Jforrester) [23:25:17] James_F: Parser: Switch from Tidy to Remex on fawiki is live on mwdebug1002, check please [23:25:31] bblack: all done and now i feel like about one user somewhere in the world did not see an error message when he tried to download mediawiki in those 10 seconds it felt a reboot of those Ganeti VMs takes. heh. but also i dont have a cached error page , right :) [23:25:58] thcipriani: LGTM. [23:26:06] going live [23:26:18] it was just so fast it almost feels like not worth the multiple changes [23:26:31] but also i dont really know how much traffic it gets at all [23:26:53] (03Merged) 10jenkins-bot: Parser: Switch from Tidy to Remex on nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383653 (https://phabricator.wikimedia.org/T177989) (owner: 10Jforrester) [23:26:54] rebooting a VM is so much faster than metal [23:27:02] (03CR) 10jenkins-bot: Parser: Switch from Tidy to Remex on nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383653 (https://phabricator.wikimedia.org/T177989) (owner: 10Jforrester) [23:27:53] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:383652|Parser: Switch from Tidy to Remex on fawiki]] T176150 (duration: 00m 50s) [23:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:01] T176150: Enable RemexHTML on fawiki - https://phabricator.wikimedia.org/T176150 [23:28:20] James_F: Parser: Switch from Tidy to Remex on nowiki is live on mwdebug1002, check please [23:29:05] thcipriani: LGTM. [23:29:10] going live [23:31:02] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:383653|Parser: Switch from Tidy to Remex on nowiki]] T177989 (duration: 00m 50s) [23:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:10] T177989: Enable RemexHTML on nowiki - https://phabricator.wikimedia.org/T177989 [23:31:35] !log netmon1003 (servermon) - upgrading kernel (jessie), schedule icinga downtime, rebooting, short downtime for servermon itself [23:31:37] thcipriani: Thanks so much. [23:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:15] James_F: yw, thanks for all the sanity the checks :) [23:33:38] jdlrobson: Print logo should use an absolute URI is live for wmf.{2,3} on mwdebug1002, check please [23:33:48] thcipriani: sweettttt [23:34:28] sync away [23:34:45] going [23:35:50] !log netmon1003 - cant seem to reboot normally, attempting via gnt-instance command [23:35:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:56] !log thcipriani@tin Synchronized php-1.31.0-wmf.3/skins/Vector/ResourceLoaderLessModule.php: SWAT: [[gerrit:383706|Print logo should use an absolute URI]] T177800 (duration: 00m 50s) [23:38:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:04] T177800: Logo doesn't show on print styles on test wikipedia - https://phabricator.wikimedia.org/T177800 [23:38:06] !log netmon1003 - servermon back up - apt-get autoremove to drop unrequired python packages [23:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:27] !log thcipriani@tin Synchronized php-1.31.0-wmf.2/skins/Vector/ResourceLoaderLessModule.php: SWAT: [[gerrit:383706|Print logo should use an absolute URI]] T177800 (duration: 00m 49s) [23:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:50] (03PS2) 10Thcipriani: Enable Vector print logo and print styles on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [23:42:21] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [23:43:10] (03PS1) 10Volans: Documentation: add ReadTheDocs configuration [software/cumin] - 10https://gerrit.wikimedia.org/r/383735 [23:44:31] (03Merged) 10jenkins-bot: Enable Vector print logo and print styles on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [23:45:23] jdlrobson: okie doke ^ is live on mwdebug1002, check please [23:45:31] cheeccking [23:45:49] yeeeyjaaaah this is looking much more promising [23:45:53] :) [23:46:10] (03CR) 10jenkins-bot: Enable Vector print logo and print styles on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383491 (https://phabricator.wikimedia.org/T169732) (owner: 10Jdlrobson) [23:46:25] make it live! not seeing any of those errors this time [23:46:44] yeap. canary logs look clean, going live :) [23:47:19] !log netmon2001 (smokeping) - kernel upgrade, schedule downtime, reboot [23:47:21] jdlrobson: anything going to break if I sync CommonSettings then InitialiseSettings? [23:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:41] thcipriani: i'd hope not [23:47:56] that would be correct order [23:48:02] ok, didn't think so, just thought I'd check as that's my plan :) [23:48:02] print logo is useless without InitialiseSettings [23:49:39] * mutante thinks why is this not coming back.. omg lol, it's a HP, now it makes sense [23:50:49] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:383491|Enable Vector print logo and print styles on all wikis]] T169732 PART I (duration: 00m 50s) [23:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:57] T169732: Deploy new desktop print styles on all projects - https://phabricator.wikimedia.org/T169732 [23:51:03] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [23:51:06] wooooo sexy print styles [23:52:00] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:383491|Enable Vector print logo and print styles on all wikis]] T169732 PART II (duration: 00m 50s) [23:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:09] ^ jdlrobson all live congrats :) [23:52:18] thcipriani: woo going to print [23:52:22] :D [23:52:29] everyone should feel free to print their favourite wikipedia article on desktop and see how much nicer it looks [23:52:33] (03CR) 10Volans: [C: 032] Documentation: add ReadTheDocs configuration [software/cumin] - 10https://gerrit.wikimedia.org/r/383735 (owner: 10Volans) [23:53:02] * thcipriani queues up https://en.wikipedia.org/wiki/List_of_English_terms_of_venery,_by_animal [23:53:04] thcipriani: verified! [23:53:53] nice, thanks [23:54:48] and not seeing anything in logstash. are you? [23:54:57] awight: I saw your "Are there any special steps to wire a MediaWiki LoggerFactory into [23:54:57] logstash-beta" Q in the SoS notes. Did you get help with that yet? [23:55:54] !log netmon2001 - smokeping back up - the reboot, as a side-effect, also resolved the systemd-icinga-alert that was pending in Icinga for a while with comments [23:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:46] bd808: I haven’t figured out the plumbing yet, but I was able to read the file logs on fluorine-beta so I’m good for now. Thanks! [23:57:08] I can give plumbing tips and tricks if you'd like :) [23:57:30] or a doc link? [23:57:33] * bd808 was responsible for a large portion of that mess [23:57:58] lol thanks a lot :p [23:58:20] awight: the closest thing to a doc is the header comment on https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/logging.php [23:58:43] (03Merged) 10jenkins-bot: Documentation: add ReadTheDocs configuration [software/cumin] - 10https://gerrit.wikimedia.org/r/383735 (owner: 10Volans) [23:59:28] basically the channel needs to be a key in $wmgMonologChannels and the value there controls where the log events are routed