[00:01:19] (03CR) 10Dzahn: [C: 04-1] "i tried a real restart. it fails with "Mar 16 23:59:49 ruthenium nginx[20720]: nginx: [emerg] invalid number of arguments in "proxy_pass" " [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [00:01:22] (03PS1) 10Krinkle: readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 [00:02:42] (03PS2) 10Krinkle: readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 [00:03:11] !log T111113: Rolling restarts of Cassandra on restbase1010 [00:03:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:17] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [00:07:39] (03CR) 10Dzahn: "that last thing was my fault for copying from .erb template and not replacing a variable with its value. it does restart ok." [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [00:08:00] (03PS4) 10Dzahn: Update ruthenium nginx conf to handle updated parsoid test domains [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [00:12:41] (03CR) 10Dzahn: [C: 032] "from tin:" [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [00:13:10] !log T111113: Rolling restarts of Cassandra on restbase1011 [00:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:13:17] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [00:14:58] (03CR) 10Dzahn: "also localhost: [ruthenium:~] $ curl -H "Host: parsoid-vd-tests.wikimedia.org" http://localhost:8001 2>/dev/null| head -n2" [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [00:17:12] (03CR) 10Dzahn: "so you are saying you might not need this at all? that would be good. also: polygerrit is still far away for us, right. can you add me aga" [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) (owner: 10Paladox) [00:19:14] (03CR) 10Dzahn: "do you need to control read/write separate? is this to fix a problem in labs?" [puppet] - 10https://gerrit.wikimedia.org/r/342276 (owner: 10Paladox) [00:20:51] (03CR) 10Dzahn: [C: 04-1] "see inline comment, looks up "read" value in Hiera but uses it for "write" variable" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342276 (owner: 10Paladox) [00:23:07] !log T111113: Rolling restarts of Cassandra on restbase1016 [00:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:23:13] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [00:26:12] starting a belated SWAT item [00:26:12] (03CR) 10Dzahn: [C: 04-1] "> When uninstalling with the service already working it will fail to stop the service" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox) [00:27:52] 06Operations, 10ops-eqiad: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#3108768 (10RobH) a:05RobH>03Cmjohnson [00:28:52] 06Operations, 10ops-eqiad: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#2497714 (10RobH) The switch port was not disabled, so glad we're adding the checklist to any open decom tasks. [00:29:10] 06Operations, 10ops-eqiad: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#3108774 (10RobH) [00:30:07] (03CR) 10Dzahn: [C: 031] "yes, i removed most (all) of these myself from DNS in the past" [puppet] - 10https://gerrit.wikimedia.org/r/285084 (https://phabricator.wikimedia.org/T105981) (owner: 10Alex Monk) [00:30:14] (03PS2) 10Dzahn: Get rid of redirects for non-resolving/parked domains [puppet] - 10https://gerrit.wikimedia.org/r/285084 (https://phabricator.wikimedia.org/T105981) (owner: 10Alex Monk) [00:30:40] (03PS3) 10Dzahn: apache: Get rid of redirects for non-resolving/parked domains [puppet] - 10https://gerrit.wikimedia.org/r/285084 (https://phabricator.wikimedia.org/T105981) (owner: 10Alex Monk) [00:32:29] (03CR) 10Dzahn: "i would suggest we take notice that you won't maintain it anymore but also don't delete it, and least not yet and see where we get from th" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [00:34:09] !log T111113: Rolling restarts of Cassandra, eqiad, rack 'b' [00:34:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:15] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [00:36:33] (03PS1) 10Ejegg: Enable mcrypt extension on CI slaves [puppet] - 10https://gerrit.wikimedia.org/r/343223 [00:36:41] Howdy opsen! [00:36:52] does that patch do what's on the tin? ^^^ [00:37:06] And if so, anybody feel like merging/deploying? [00:37:20] on the tin? :P [00:37:36] err, on the commit message [00:37:45] i think it's an old british-ism [00:37:56] oh [00:38:02] haha [00:38:04] https://en.wikipedia.org/wiki/Does_exactly_what_it_says_on_the_tin [00:38:12] lol [00:38:25] whoa, not so old after all [00:38:25] * Platonides was thinking it was related to the server 'tin' [00:38:38] Yeah, I know what it meant when you explained you explained it a bit (I am british ;)) [00:38:41] Ditto [00:38:50] That overrode that in context :P [00:39:53] hehe, anyway, is that the right way to get mcrypt installed & enabled on integration-slave-* ? [00:40:06] !log ebernhardson@tin Synchronized php-1.29.0-wmf.16/extensions/WikimediaEvents/modules/ext.wikimediaEvents.searchSatisfaction.js: SWAT: enabled sister search AB test on 8 wikis (duration: 00m 43s) [00:40:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:40:12] late swat :P [00:41:39] !log ebernhardson@tin Synchronized php-1.29.0-wmf.16/resources/src/mediawiki.special/: SWAT: Fix search result percentage width when no interwiki sidebar shown (duration: 00m 42s) [00:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:43:25] context: fr-tech's got a stale branch of the civicrm-buildkit tool which sets up fresh Civi installs for tests [00:43:49] and the upstream version's now puking if its favorite extensions are not installed [00:44:32] seeing as how mcrypt is deprecated in php7, we should also check if that requirement's on the chopping block [00:44:55] but for now, it'd be nice to have the upstream version of the tool work [00:45:18] https://gerrit.wikimedia.org/r/336960 [00:45:44] PROBLEM - cassandra-a CQL 10.64.32.202:9042 on restbase1012 is CRITICAL: connect to address 10.64.32.202 and port 9042: Connection refused [00:46:44] RECOVERY - cassandra-a CQL 10.64.32.202:9042 on restbase1012 is OK: TCP OK - 0.000 second response time on 10.64.32.202 port 9042 [00:49:54] PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:55:34] PROBLEM - cassandra-a CQL 10.64.32.205:9042 on restbase1013 is CRITICAL: connect to address 10.64.32.205 and port 9042: Connection refused [00:56:34] RECOVERY - cassandra-a CQL 10.64.32.205:9042 on restbase1013 is OK: TCP OK - 0.000 second response time on 10.64.32.205 port 9042 [00:58:44] PROBLEM - cassandra-b CQL 10.64.32.206:9042 on restbase1013 is CRITICAL: connect to address 10.64.32.206 and port 9042: Connection refused [00:59:44] RECOVERY - cassandra-b CQL 10.64.32.206:9042 on restbase1013 is OK: TCP OK - 0.000 second response time on 10.64.32.206 port 9042 [01:00:54] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:12:18] !log T111113: Rolling restarts of Cassandra, eqiad, rack 'd' [01:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:12:25] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [01:17:54] PROBLEM - cassandra-b CQL 10.64.48.130:9042 on restbase1009 is CRITICAL: connect to address 10.64.48.130 and port 9042: Connection refused [01:17:55] RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [01:18:54] RECOVERY - cassandra-b CQL 10.64.48.130:9042 on restbase1009 is OK: TCP OK - 0.000 second response time on 10.64.48.130 port 9042 [01:19:16] (03PS1) 10Krinkle: errorpage: Document how the different files are used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343227 (https://phabricator.wikimedia.org/T113114) [01:27:54] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [01:37:04] PROBLEM - cassandra-b CQL 10.64.48.139:9042 on restbase1015 is CRITICAL: connect to address 10.64.48.139 and port 9042: Connection refused [01:38:04] RECOVERY - cassandra-b CQL 10.64.48.139:9042 on restbase1015 is OK: TCP OK - 0.000 second response time on 10.64.48.139 port 9042 [01:54:40] !log T111113: Rolling restarts of Cassandra complete [01:54:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:54:47] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [01:57:24] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:27:24] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [02:33:48] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.16) (duration: 12m 12s) [02:33:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:54] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:39:10] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Mar 17 02:39:10 UTC 2017 (duration 5m 22s) [02:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:40:17] (03PS3) 10Krinkle: readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 [02:40:19] (03CR) 10Krinkle: [C: 032] readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 (owner: 10Krinkle) [02:40:25] (03CR) 10Krinkle: [C: 032] readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 (owner: 10Krinkle) [02:40:30] (03PS2) 10Krinkle: errorpage: Document how the different files are used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343227 (https://phabricator.wikimedia.org/T113114) [02:40:43] (03CR) 10Krinkle: [C: 032] "no-op, doc txt file change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343227 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [02:41:34] (03Merged) 10jenkins-bot: readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 (owner: 10Krinkle) [02:41:44] (03CR) 10jenkins-bot: readme: Update documentation about current directory structure [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343219 (owner: 10Krinkle) [02:41:47] (03Merged) 10jenkins-bot: errorpage: Document how the different files are used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343227 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [02:43:45] (03CR) 10jenkins-bot: errorpage: Document how the different files are used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343227 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [03:06:54] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [03:13:23] (03PS1) 10Krinkle: speed-tests: Remove old/unused files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343241 [03:13:33] (03CR) 10Krinkle: [C: 032] speed-tests: Remove old/unused files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343241 (owner: 10Krinkle) [03:15:24] (03Merged) 10jenkins-bot: speed-tests: Remove old/unused files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343241 (owner: 10Krinkle) [03:16:50] (03CR) 10jenkins-bot: speed-tests: Remove old/unused files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343241 (owner: 10Krinkle) [03:54:04] PROBLEM - puppet last run on db1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:11:04] PROBLEM - MariaDB Slave Lag: m3 on db1048 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 370.00 seconds [04:23:04] RECOVERY - puppet last run on db1053 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [04:24:04] PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:52:04] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [05:06:34] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:09:33] (03PS1) 10Catrope: statistics/cruncher: Add reportupdater job for edit-beta-features [puppet] - 10https://gerrit.wikimedia.org/r/343246 [05:09:58] (03CR) 10jerkins-bot: [V: 04-1] statistics/cruncher: Add reportupdater job for edit-beta-features [puppet] - 10https://gerrit.wikimedia.org/r/343246 (owner: 10Catrope) [05:10:19] (03CR) 10Catrope: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/343246 (owner: 10Catrope) [05:26:16] (03PS2) 10Catrope: statistics/cruncher: Add reportupdater job for edit-beta-features [puppet] - 10https://gerrit.wikimedia.org/r/343246 [05:32:04] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:35:34] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:54:45] 06Operations, 10ops-eqiad, 10hardware-requests: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#3109058 (10Dzahn) [05:59:24] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3109059 (10Dzahn) this should be done now and work:) right, @ssastry ? copied over from gerrit:343099: ``` from tin: [tin:~]... [06:00:15] RECOVERY - puppet last run on mw1288 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:03:14] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:04:59] 06Operations, 10DNS, 10Parsoid, 10Traffic, 13Patch-For-Review: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3109061 (10Dzahn) 05Open>03Resolved [06:05:51] 06Operations, 10DNS, 10Parsoid, 10Traffic: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3085873 (10Dzahn) a:05ssastry>03Dzahn [06:06:01] 06Operations, 10DNS, 10Parsoid, 10Traffic: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3085873 (10Dzahn) please reopen if anything is missing [06:09:14] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:10:14] PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:14:17] (03CR) 10Dzahn: [C: 031] "bump, review for this would be nice" [dns] - 10https://gerrit.wikimedia.org/r/341359 (https://phabricator.wikimedia.org/T158638) (owner: 10Dzahn) [06:15:17] (03PS6) 10Dzahn: change MX records for wikimedia.ee from elkdata.ee to Google [dns] - 10https://gerrit.wikimedia.org/r/341359 (https://phabricator.wikimedia.org/T158638) [06:39:14] RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:02:51] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3109076 (10elukey) @mmodell: mw2256 should be shutdown now, we are not planning to bring it up again until we'll have the new DIMM :) [07:12:25] (03PS1) 10Marostegui: db-eqiad.php: Increase weight db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343250 (https://phabricator.wikimedia.org/T157931) [07:13:37] 06Operations, 10ops-eqiad: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3109081 (10Marostegui) [07:13:39] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1070 - https://phabricator.wikimedia.org/T158969#3109079 (10Marostegui) 05Open>03Resolved It is all good now, thank you Chris! ``` root@db1070:~# megacli -PDRbld -ShowProg -PhysDrv [32:10] -aALL Device(Encl-32 Slot-10) is not in rebuild process Exi... [07:21:48] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343250 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [07:23:43] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343250 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [07:23:57] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343250 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [07:24:55] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1070 - T157931 (duration: 00m 45s) [07:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:02] T157931: s5: db1070 not using file per table - https://phabricator.wikimedia.org/T157931 [07:29:19] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU broken - slave lagging - https://phabricator.wikimedia.org/T160731#3109087 (10Marostegui) [07:29:24] PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:34:11] (03PS2) 10Muehlenhoff: Change email address for Wes Moran [puppet] - 10https://gerrit.wikimedia.org/r/341283 [07:36:35] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109103 (10Marostegui) [07:37:01] (03CR) 10Muehlenhoff: [C: 032] Change email address for Wes Moran [puppet] - 10https://gerrit.wikimedia.org/r/341283 (owner: 10Muehlenhoff) [07:37:24] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:40:36] (03PS1) 10Muehlenhoff: Fix -turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 [07:42:05] (03CR) 10jerkins-bot: [V: 04-1] Fix -turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 (owner: 10Muehlenhoff) [07:42:22] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109104 (10Marostegui) I have manually forced a BBU learn cycle and it is now looking fine: ``` root@db1048:~# megacli -AdpBbuCmd -BbuLearn -aALL -NoLog Adapter 0: BBU Learn Succ... [07:42:29] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109105 (10Marostegui) 05Open>03Resolved a:03Marostegui [07:50:04] RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [07:50:40] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343253 [07:50:44] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343253 [07:53:04] (03PS2) 10Muehlenhoff: Fix --turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 [07:54:46] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109087 (10jcrespo) Do you think we should force a learning cycle to db1047 T159266 ? [07:55:37] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109118 (10Marostegui) I just tried - we will see! [07:56:36] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109119 (10Marostegui) But db1047 one has a different (and more worrying error) for BBU a1: ``` Battery State: Failed ``` [07:57:01] (03CR) 10Muehlenhoff: [C: 032] Fix --turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 (owner: 10Muehlenhoff) [07:57:07] (03PS3) 10Muehlenhoff: Fix --turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 [07:57:24] RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:57:56] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343253 (owner: 10Marostegui) [07:59:13] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343253 (owner: 10Marostegui) [07:59:22] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2065" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343253 (owner: 10Marostegui) [08:00:16] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2065 - T160415 - T73563 (duration: 00m 44s) [08:00:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:24] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:00:24] T73563: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563 [08:03:27] (03PS1) 10Marostegui: db-codfw.php: Depool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343255 (https://phabricator.wikimedia.org/T160415) [08:05:24] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [08:09:54] does someone know a quick way to know all deployed extension on production wikis? [08:11:18] wmf-config/extension-list would be the canonical place ? [08:11:23] jynus: [operations/mediawiki-config]/wmf-config/extension-list probably [08:11:36] yeah [08:11:43] thanks, MatmaRex ! [08:12:19] I assume those could be enabled, but disabled with a switch on config :-/ [08:13:16] yeahhh [08:13:20] what do you need this list for? [08:13:58] knowing if a table should be on which wikis [08:14:13] an optional table for an extension [08:15:25] but if an extension it has been enabled but the table has not been created, I think the only reliable way to know is actually checking the db [08:15:41] which is ok, just it takes more time [08:27:55] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343255 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:29:59] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343255 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:30:09] (03CR) 10jenkins-bot: db-codfw.php: Depool db2058 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343255 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:31:03] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109156 (10Marostegui) db1047's BBU is acting weirdly It goes from Failed -> Charging -> Failed It is acting very weirdly, it has gone from ``` Relative State of Charge: 4 % Charge... [08:31:53] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2058 - T160415 - T73563 (duration: 00m 43s) [08:32:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:01] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:32:01] T73563: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563 [08:32:07] !log Deploy schema change on dbstore2002 and db2058 (s4) - T160415 T73563 [08:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:09] jynus: I guess you can also pass something like "var_dump( ExtensionRegistry::getInstance()->isLoaded( 'RevisionSlider' ) );" into eval.php for the wiki [08:42:22] Just make sure you get the correct name needed to pass in there [08:43:08] nah, I do not want to bother app servers for that [08:43:15] but thanks for the suggestion [08:45:46] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:47:14] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:49:08] (03PS1) 10Giuseppe Lavagetto: role::configcluster: temporary fix for etcd replication [puppet] - 10https://gerrit.wikimedia.org/r/343259 [08:49:33] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] role::configcluster: temporary fix for etcd replication [puppet] - 10https://gerrit.wikimedia.org/r/343259 (owner: 10Giuseppe Lavagetto) [08:55:34] PROBLEM - MariaDB Slave Lag: s4 on dbstore2002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 871.23 seconds [08:56:33] ^ me - I thought it was silenced [08:59:29] (03CR) 10Hashar: "My understanding is:" [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) (owner: 10Paladox) [09:02:51] (03PS4) 10Muehlenhoff: Fix --turn-volunteer mode in LDAP offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/343252 [09:04:06] !log killing 11h-running query on db1089 from terbium (orphan process) [09:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:59] (03PS2) 10Gehel: wdqs - remove unneeded puppet dependency [puppet] - 10https://gerrit.wikimedia.org/r/343128 [09:07:47] (03PS1) 10Muehlenhoff: Also add --turn-volunteer mode for Phabricator offboarding [puppet] - 10https://gerrit.wikimedia.org/r/343260 [09:08:31] (03CR) 10Gehel: [C: 032] wdqs - remove unneeded puppet dependency [puppet] - 10https://gerrit.wikimedia.org/r/343128 (owner: 10Gehel) [09:08:42] (03CR) 10Paladox: "@Hashar hi, I've done https://gerrit-review.googlesource.com/#/c/99004/ but it will still need some rewrites as I haven't fixed it all in " [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) (owner: 10Paladox) [09:10:49] (03PS9) 10Paladox: Gerrit: Add some apache rewrite rules for polygerrit [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) [09:11:11] (03PS10) 10Paladox: Gerrit: Add some apache rewrite rules for polygerrit [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) [09:11:22] (03PS2) 10Muehlenhoff: Also add --turn-volunteer mode for Phabricator offboarding [puppet] - 10https://gerrit.wikimedia.org/r/343260 [09:12:19] (03CR) 10Paladox: "> so you are saying you might not need this at all? that would be" [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) (owner: 10Paladox) [09:14:47] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [09:16:12] (03PS1) 10Gilles: Consolidate Performance team's root access [puppet] - 10https://gerrit.wikimedia.org/r/343262 (https://phabricator.wikimedia.org/T151065) [09:16:17] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:16:36] (03CR) 10Muehlenhoff: [C: 032] Also add --turn-volunteer mode for Phabricator offboarding [puppet] - 10https://gerrit.wikimedia.org/r/343260 (owner: 10Muehlenhoff) [09:17:24] (03PS3) 10Gilles: Enable memcache-based Thumbor broken thumbnail throttling [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) [09:23:03] (03PS1) 10Gilles: Make Thumbor connect to Swift via https [puppet] - 10https://gerrit.wikimedia.org/r/343263 (https://phabricator.wikimedia.org/T160670) [09:33:32] (03CR) 10Muehlenhoff: "Please file an access request ticket, this needs to be acked in Monday's Ops meeting:" [puppet] - 10https://gerrit.wikimedia.org/r/343262 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [09:38:58] 06Operations, 10Ops-Access-Requests: Requesting access to perf-roots for gilles - https://phabricator.wikimedia.org/T160736#3109245 (10Gilles) [09:39:05] (03CR) 10Gilles: "https://phabricator.wikimedia.org/T160736" [puppet] - 10https://gerrit.wikimedia.org/r/343262 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [09:39:47] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:45:57] 06Operations, 10Graphite, 06Performance-Team: Increase Grafana user rights for Performance team members - https://phabricator.wikimedia.org/T160738#3109276 (10Gilles) [09:49:07] 06Operations, 10Graphite, 06Performance-Team: Increase Grafana user rights for Performance team members - https://phabricator.wikimedia.org/T160738#3109294 (10Gilles) I can't find where that's defined in Puppet. [09:56:27] PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:01:09] 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3109303 (10MoritzMuehlenhoff) [10:01:11] 06Operations: Offboarding script for account handling - https://phabricator.wikimedia.org/T142825#3109301 (10MoritzMuehlenhoff) 05Open>03Resolved An offboarding script for LDAP and Phabricator has been added to puppet.git, it's available on terbium as offboard-user. Docs have been updated at https://office.w... [10:03:35] 06Operations, 07LDAP: Synchronise groups defined in data.yaml to LDAP - https://phabricator.wikimedia.org/T142821#3109305 (10MoritzMuehlenhoff) p:05Normal>03Low Memberships of wmf/nda/ops are already checked. It would be rather straightforward to implement a sync of selected groups from data.yaml to LDAP.... [10:07:17] PROBLEM - puppet last run on db1087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:08:47] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [10:16:56] 06Operations, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3109323 (10elukey) Interesting thing that I found today while debugging with tcpdump. There are constant TCP RST packets that... [10:24:27] RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:25:46] (03PS1) 10Ema: cache_text varnishtest: 'Vary: Cookie' and Non-Session cookies [puppet] - 10https://gerrit.wikimedia.org/r/343267 [10:26:17] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:28:13] (03PS2) 10Ema: cache_text varnishtest: 'Vary: Cookie' and Non-Session cookies [puppet] - 10https://gerrit.wikimedia.org/r/343267 (https://phabricator.wikimedia.org/T154954) [10:29:52] (03CR) 10Ema: [V: 032 C: 032] cache_text varnishtest: 'Vary: Cookie' and Non-Session cookies [puppet] - 10https://gerrit.wikimedia.org/r/343267 (https://phabricator.wikimedia.org/T154954) (owner: 10Ema) [10:34:29] (03CR) 10Hashar: [C: 04-1] "We have added PHP 7 to be able to run esty/phan static code analyzer and that is about it. CI does not have the capacity to run PHP 7 job" [puppet] - 10https://gerrit.wikimedia.org/r/343209 (owner: 10Paladox) [10:36:17] RECOVERY - puppet last run on db1087 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:39:38] (03CR) 10Alexandros Kosiaris: [C: 031] postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 (owner: 10Gehel) [10:43:31] 06Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3109333 (10MoritzMuehlenhoff) 05Open>03declined We won't provide PHP 7.1 in the foreseeable future. Providing a package like PHP on apt.wikimedia.org is not a one time effort, but requires continuous mainte... [10:43:38] (03PS1) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [10:48:02] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#3109339 (10ema) >>! In T154954#3108634, @Ejegg wrote: > Wouldn't Vary: Cookie explode caching all over the place due to... [10:51:04] 06Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3109356 (10Paladox) We won't be able to apply backwards compatible fix for phabricator. Since the daemons uses a php 7.1 feature and a 5.* feature. It won't be able to support php 7.0. [10:51:53] 06Operations, 10ops-eqiad, 10DBA, 10Phabricator: db1048 BBU Faulty - slave lagging - https://phabricator.wikimedia.org/T160731#3109357 (10Marostegui) Unfortunately, db1047's BBU looks totally broken, it is not making any sense in what it reports. Some places it says it is fully charged, some others don't,... [10:53:09] (03PS2) 10Alexandros Kosiaris: Add private LVS IPs in network::subnets data [puppet] - 10https://gerrit.wikimedia.org/r/341787 [10:53:16] (03PS3) 10Alexandros Kosiaris: Add private LVS IPs in network::subnets data [puppet] - 10https://gerrit.wikimedia.org/r/341787 [10:53:28] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add private LVS IPs in network::subnets data [puppet] - 10https://gerrit.wikimedia.org/r/341787 (owner: 10Alexandros Kosiaris) [10:54:17] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:54:17] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [10:57:30] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3109383 (10Paladox) Upstream gerrit has added in built support for the MariaDB connector. So we won't need it pac... [10:58:13] (03CR) 10Muehlenhoff: [C: 04-1] "Hold this until gerrit is switched to systemd, this should ensure a proper service stop and I doubt this hack would be needed any longer" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox) [10:58:48] !log reimage helium.eqiad.wmnet to jessie [10:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:06] (03CR) 10Muehlenhoff: [C: 032] Gerrit: Add a systemd init script fro gerrit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [11:05:51] (03PS1) 10Muehlenhoff: Bump version for new package with gerrit systemd unit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343275 [11:06:49] (03PS1) 10Elukey: Add delay compress to upstart's logrotate [puppet] - 10https://gerrit.wikimedia.org/r/343276 (https://phabricator.wikimedia.org/T132324) [11:07:01] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 13Patch-For-Review, 07Wikimedia-Multiple-active-datacenters: MediaWiki Datacenter Switchover automation - https://phabricator.wikimedia.org/T160178#3109398 (10Volans) [11:08:00] (03CR) 10Elukey: [C: 032] Add delay compress to upstart's logrotate [puppet] - 10https://gerrit.wikimedia.org/r/343276 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [11:11:07] 06Operations, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3109415 (10elukey) Another thing worth to discuss is the following snippet: https://github.com/wikimedia/mediawiki/blob/wmf/... [11:11:54] (03CR) 10Muehlenhoff: [C: 032] Bump version for new package with gerrit systemd unit [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343275 (owner: 10Muehlenhoff) [11:20:03] (03PS7) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [11:22:17] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [11:28:06] (03PS1) 10Muehlenhoff: Fix Debian source format [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343280 [11:29:57] (03CR) 10Muehlenhoff: "I've built packages, they're available at https://people.wikimedia.org/~jmm/gerrit/" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [11:31:29] (03CR) 10Muehlenhoff: [V: 032 C: 032] Fix Debian source format [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343280 (owner: 10Muehlenhoff) [11:32:45] 06Operations, 06Performance-Team, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3109425 (10jeblad) Btw, the code at Wikidata can be found at [[ https://www.wikidata.org/wiki/Module:Cycling_race | d:Module:Cycling rac... [11:33:24] !log reimage analytics1044 (Hadoop Worker node) to Debian Jessie [11:33:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:12] 06Operations, 06Performance-Team, 10Wikidata, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3109427 (10Reedy) [11:39:51] (03PS2) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [11:41:14] (03PS3) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [11:52:31] 06Operations, 10Analytics, 10Analytics-Cluster, 13Patch-For-Review, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3109448 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['analy... [12:15:31] (03PS1) 10Jcrespo: osm: Change osm master rsync check with different parameter order [puppet] - 10https://gerrit.wikimedia.org/r/343285 (https://phabricator.wikimedia.org/T157359) [12:17:23] (03CR) 10Alexandros Kosiaris: [C: 031] osm: Change osm master rsync check with different parameter order [puppet] - 10https://gerrit.wikimedia.org/r/343285 (https://phabricator.wikimedia.org/T157359) (owner: 10Jcrespo) [12:18:04] 06Operations, 10Analytics, 10Analytics-Cluster, 13Patch-For-Review, 15User-Elukey: Reimage all the Hadoop worker nodes to Debian Jessie - https://phabricator.wikimedia.org/T160333#3109495 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['analytics1044.eqiad.wmnet'] ``` and were **ALL** succe... [12:18:18] (03PS2) 10Rush: toolschecker: remove precise checks [puppet] - 10https://gerrit.wikimedia.org/r/342161 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis) [12:20:00] (03CR) 10Jcrespo: [C: 032] osm: Change osm master rsync check with different parameter order [puppet] - 10https://gerrit.wikimedia.org/r/343285 (https://phabricator.wikimedia.org/T157359) (owner: 10Jcrespo) [12:20:16] (03CR) 10Rush: [C: 032] toolschecker: remove precise checks [puppet] - 10https://gerrit.wikimedia.org/r/342161 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis) [12:23:52] (03PS3) 10Rush: toolschecker: remove precise checks [puppet] - 10https://gerrit.wikimedia.org/r/342161 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis) [12:25:10] 06Operations, 06Labs: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3109497 (10chasemp) 05Open>03Resolved a:03chasemp > labstore1003.eqiad.wmnet:/dumps nfs4 28T 18T 11T 64% /public/dumps [12:25:18] (03PS4) 10Rush: toolschecker: remove precise checks [puppet] - 10https://gerrit.wikimedia.org/r/342161 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis) [12:26:21] (03PS4) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [12:27:27] (03CR) 10Giuseppe Lavagetto: Add stages to set DB read-only/read-write mode (035 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans) [12:29:40] (03CR) 10Rush: [C: 032] toolschecker: remove precise checks [puppet] - 10https://gerrit.wikimedia.org/r/342161 (https://phabricator.wikimedia.org/T94792) (owner: 10BryanDavis) [12:30:33] (03PS3) 10Rush: etcd: etcd-backup.py needs a type set for argparse 'keep' [puppet] - 10https://gerrit.wikimedia.org/r/341607 [12:31:12] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3109518 (10jcrespo) Unless anyone else says so, I will reimage the old server on Monday. Last chance to check data and functionality works on the ne... [12:32:13] (03CR) 10Giuseppe Lavagetto: [C: 032] etcd: etcd-backup.py needs a type set for argparse 'keep' [puppet] - 10https://gerrit.wikimedia.org/r/341607 (owner: 10Rush) [12:32:22] (03CR) 10Rush: [C: 032] etcd: etcd-backup.py needs a type set for argparse 'keep' [puppet] - 10https://gerrit.wikimedia.org/r/341607 (owner: 10Rush) [12:32:54] 06Operations, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3109531 (10elukey) [12:33:41] maybe this could be interesting for the media folks: https://betanews.com/2017/03/16/google-guetzli-open-source-jpeg-encoder/ [12:36:10] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: Labvirt1001 has insanely slow IO - https://phabricator.wikimedia.org/T159835#3109536 (10chasemp) p:05Triage>03High [12:44:13] !log labsdb10[09|10|11] maintain-views --table user_groups --all-database --replace-all --debug [12:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:17] (03CR) 10Zppix: [C: 031] Move contribution tracking config to CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342857 (https://phabricator.wikimedia.org/T147479) (owner: 10Chad) [12:46:47] !log labsdb10[01|03] maintain-views --table user_groups --all-database --replace-all --debug [12:46:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:13] (03CR) 10Rush: [C: 04-1] "image seems broken -- I wasn't able to get it to return within current timeout for creation" [puppet] - 10https://gerrit.wikimedia.org/r/343207 (owner: 10Andrew Bogott) [12:54:17] 06Operations, 06Labs: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3109559 (10hashar) p:05High>03Normal Since labvirt1001 and labvirt1002 have been removed from the scheduler pool, the time to get ssh access has significantly dropped and seems to be rather s... [12:55:38] (03CR) 10Paladox: "Thanks." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [13:02:21] 06Operations, 06Labs: paramiko (python SSH implementation) needs older hashes for host authentication - https://phabricator.wikimedia.org/T106871#3109576 (10chasemp) 05Open>03Invalid We removed paramiko from the backup pipeline [13:22:29] (03Draft1) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [13:22:31] (03PS2) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [13:26:12] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2058" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343299 [13:31:07] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2058" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343299 (owner: 10Marostegui) [13:34:50] 06Operations, 10hardware-requests: codfw: (2) servers request for ORES redis databases - https://phabricator.wikimedia.org/T142190#3109635 (10mark) [13:37:36] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2058" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343299 (owner: 10Marostegui) [13:37:46] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2058" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343299 (owner: 10Marostegui) [13:39:08] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2058 - T160415 - T73563 (duration: 01m 06s) [13:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:16] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [13:39:16] T73563: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563 [13:40:06] (03PS1) 10Alexandros Kosiaris: akosiaris dot files: Enable vim dark background [puppet] - 10https://gerrit.wikimedia.org/r/343300 [13:40:29] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] akosiaris dot files: Enable vim dark background [puppet] - 10https://gerrit.wikimedia.org/r/343300 (owner: 10Alexandros Kosiaris) [13:41:24] 06Operations, 06Discovery, 06Discovery-Search (Current work), 13Patch-For-Review: remove swap from elasticsearch servers - https://phabricator.wikimedia.org/T158884#3050526 (10Deskana) >>! In T158884#3099126, @Gehel wrote: > Swap is now disabled on all elasticsearch servers. [[ https://gerrit.wikimedia.org... [13:41:41] 06Operations, 06Discovery, 06Discovery-Search (Current work), 13Patch-For-Review: remove swap from elasticsearch servers - https://phabricator.wikimedia.org/T158884#3050526 (10Deskana) 05Open>03Resolved [13:42:08] 06Operations, 06Discovery, 06Discovery-Search (Current work), 13Patch-For-Review: remove swap from elasticsearch servers - https://phabricator.wikimedia.org/T158884#3109671 (10Gehel) Correct! Thanks for the cleanup! [13:42:51] (03PS1) 10Gehel: wdqs - remove trebuchet based deployment [puppet] - 10https://gerrit.wikimedia.org/r/343302 [13:43:57] (03CR) 10jerkins-bot: [V: 04-1] wdqs - remove trebuchet based deployment [puppet] - 10https://gerrit.wikimedia.org/r/343302 (owner: 10Gehel) [13:50:49] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [13:51:49] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [13:55:25] (03PS2) 10Gehel: wdqs - remove trebuchet based deployment [puppet] - 10https://gerrit.wikimedia.org/r/343302 [13:55:40] RECOVERY - MariaDB Slave Lag: s4 on dbstore2002 is OK: OK slave_sql_lag Replication lag: 53.55 seconds [13:57:02] (03PS5) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [13:57:07] (03PS1) 10Marostegui: db-codfw.php: Depool db2051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343303 (https://phabricator.wikimedia.org/T160415) [13:57:10] (03CR) 10Volans: "replies inline" (035 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans) [13:58:39] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343303 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:59:19] (03CR) 10Alexandros Kosiaris: [C: 04-1] "this is pretty good now, one last minor comment and I think we are ok" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [14:00:01] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343303 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:00:11] (03CR) 10jenkins-bot: db-codfw.php: Depool db2051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343303 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:00:25] (03PS1) 10Hashar: contint: drop a hack for python PIL [puppet] - 10https://gerrit.wikimedia.org/r/343305 (https://phabricator.wikimedia.org/T101550) [14:00:54] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2051 - T160415 - T73563 (duration: 00m 42s) [14:01:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:02] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [14:01:02] T73563: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563 [14:01:36] !log Deploy schema change on dbstore1001 and db2051 (s4) - T160415 - T73563 [14:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:03:32] (03PS1) 10Hashar: contint: remove Precise related switches [puppet] - 10https://gerrit.wikimedia.org/r/343306 (https://phabricator.wikimedia.org/T158652) [14:04:08] (03CR) 10Alexandros Kosiaris: [C: 032] contint: drop a hack for python PIL [puppet] - 10https://gerrit.wikimedia.org/r/343305 (https://phabricator.wikimedia.org/T101550) (owner: 10Hashar) [14:04:16] \O/ [14:04:30] heh [14:04:48] hashar: you make me feel like your patches take ages to get merged ;-) [14:05:13] (03CR) 10Hashar: [C: 04-1] "That would be for Monday. Merging this can potentially break other slaves and I can't really monitor them today or this week end." [puppet] - 10https://gerrit.wikimedia.org/r/343306 (https://phabricator.wikimedia.org/T158652) (owner: 10Hashar) [14:05:42] \o/ [14:05:48] akosiaris: I don't have statistics, but important patches usually get merged under an hour or so [14:05:58] nice to see precises going away [14:06:09] I mean anything that really have a production impact and are not a huge pile of random puppet copy pasted from stack overflow :} [14:06:11] (03CR) 10Rush: "reason for -1 seems to have been a one-off and while not cool, probably not related" [puppet] - 10https://gerrit.wikimedia.org/r/343207 (owner: 10Andrew Bogott) [14:06:15] (03PS3) 10Rush: Nova fullstack test: Switch to a testing image, temporarily [puppet] - 10https://gerrit.wikimedia.org/r/343207 (owner: 10Andrew Bogott) [14:06:51] gonna clean up the mediawiki module now [14:07:27] akosiaris: I am also hoping to drop Trusty from CI by end of june [14:08:24] (03CR) 10Andrew Bogott: [C: 032] Nova fullstack test: Switch to a testing image, temporarily [puppet] - 10https://gerrit.wikimedia.org/r/343207 (owner: 10Andrew Bogott) [14:08:39] PROBLEM - puppet last run on elastic1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:08:49] hashar: would you be able to ? jesse is shipping with php 5.6 and you might want to stick to 5.5 ? [14:09:43] akosiaris: we have a task to use PHP 5.5 packages on Jessie. [14:09:53] sury.org build some that are coinstallable \O/ [14:10:08] so we can have CI to invoke either php55 (Zend 5.5) or php5 (Zend 5.6) [14:10:09] ah, lemme know how that goes [14:10:30] and we already do that for php7 . Kunal set that up iirc [14:10:37] ah yes indeed [14:11:05] and once CI has phased out Trusty, I guess we can generate the base images using bootstrap-vz \O/ [14:14:25] (03PS2) 10Eevans: Enable encrypted client connections in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) [14:18:10] (03PS1) 10Hashar: mediawiki: remove Precise class packages::legacy [puppet] - 10https://gerrit.wikimedia.org/r/343309 (https://phabricator.wikimedia.org/T158652) [14:18:15] (03PS3) 10Eevans: Enable encrypted client connections in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) [14:19:35] (03CR) 10Hashar: "At least for CI and beta cluster this should be fine. For production I don't think we have any Precise servers still using mediawiki, tha" [puppet] - 10https://gerrit.wikimedia.org/r/343309 (https://phabricator.wikimedia.org/T158652) (owner: 10Hashar) [14:19:39] RECOVERY - bacula sd process on helium is OK: PROCS OK: 1 process with UID = 110 (bacula), command name bacula-sd [14:20:14] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3109759 (10Halfak) OK. Time to try to ping the larger set of people who have databases here. Here's the databases that match Phab users: * @dartar (dartar) * @drdee (diederi... [14:20:19] PROBLEM - puppet last run on ocg1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:21:10] 07Puppet, 06Labs, 13Patch-For-Review: Labs: Could not find dependency File[/usr/lib/ganglia/python_modules] for File[/usr/lib/ganglia/python_modules/gmond_memcached.py] - https://phabricator.wikimedia.org/T95107#3109763 (10chasemp) 05Open>03Resolved a:03chasemp closing due to age and activity (seems fi... [14:21:44] (03CR) 10Eevans: "Puppet compiler output: http://puppet-compiler.wmflabs.org/5811" [puppet] - 10https://gerrit.wikimedia.org/r/342903 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [14:21:45] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109766 (10Marostegui) 05Open>03stalled Let's block this as db1047 might be decommissioned soon as per: T156844 [14:32:37] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3109809 (10Ottomata) Wow, old DB! decleramaul, nimish and rfaulk we can certainly get rid of. [14:32:49] 06Operations, 10DNS, 10Parsoid, 10Traffic: Separate subdomain for parsoid visual diff test service on ruthenium - https://phabricator.wikimedia.org/T159995#3109810 (10ssastry) Thanks. Yes, it works now. [14:34:12] (03PS5) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [14:35:39] (03CR) 10jerkins-bot: [V: 04-1] [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 (owner: 10BBlack) [14:36:54] (03PS6) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [14:37:41] RECOVERY - puppet last run on elastic1052 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:37:58] (03PS1) 10Gehel: wdqs - make nginx ports configureable [puppet] - 10https://gerrit.wikimedia.org/r/343312 [14:38:26] (03CR) 10jerkins-bot: [V: 04-1] [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 (owner: 10BBlack) [14:39:15] (03PS10) 10Ladsgroup: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [14:41:34] (03PS7) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [14:41:36] (03CR) 10Gehel: [C: 032] wdqs - make nginx ports configureable [puppet] - 10https://gerrit.wikimedia.org/r/343312 (owner: 10Gehel) [14:42:37] (03CR) 10Ladsgroup: service: Send uwsgi logs to logstash (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [14:46:51] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 642187 [14:47:41] urandom: o/ [14:48:21] RECOVERY - puppet last run on ocg1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [14:48:26] for restbase-dev1001 I think that we'd need to manually fail sdc in the md arrays, then clear (in some way) the megacli status and then rebuild the md raids [14:49:12] elukey: i was just going to ask you about this... [14:49:43] elukey: you sound less than certain here :) [14:51:33] urandom: yes not a big expert but I can try :) [14:51:43] I had to deal with megacli for the analytics hosts [14:52:05] elukey: these *are* the analytics hosts :) [14:52:13] reincarnate! [14:52:24] nope the AQS hosts! [14:52:26] :P [14:52:32] oh, right right [14:53:43] sheesh, the megacli output is impenetrable [14:53:48] * urandom 's eyes bleed [14:54:08] urandom: hahahah [14:54:23] so the raid0 array will loose all the data of course [14:54:42] yeah [14:54:57] I propose to force failure of /dev/sdc partitions on all the md arrays [14:55:01] remove them [14:55:06] clear the status on megacli [14:55:12] add partitions etc.. [14:55:15] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109863 (10Cmjohnson) @marostegui There are a few decom db's now I could swap out the bbu if you like or just proceed with the decom process. Let me know your prefe... [14:55:16] rebuild or reimage [14:55:39] the raid1 can be rebuilt, i assume [14:56:00] and /srv/ reformatted... [14:56:07] yes.. [14:56:22] (03PS8) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [14:57:54] urandom: standup and then I'll do some tests [14:57:56] ok? [14:58:01] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109864 (10Marostegui) Hey @Cmjohnson! let's wait to see if that ticket keeps progressing for now, if the server is going to get decommissioned it would be just a was... [14:58:01] elukey: sure! [14:58:31] PROBLEM - Check systemd state on helium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:58:41] PROBLEM - bacula director process on helium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (bacula), command name bacula-dir [14:58:42] PROBLEM - bacula sd process on helium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (bacula), command name bacula-sd [14:59:21] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109866 (10jcrespo) @Marostegui precisely on that ticket they are discussing when they will be able to decom it, and it is not going to happen per months as it looks.... [14:59:48] not sure what that is but I imagine worth an akosiaris^ ping [15:00:02] yeah aware, acknowleding right now [15:00:31] damn reimaging did not go very well, thankfully I had taken some preventive measures [15:00:32] kk [15:01:01] ACKNOWLEDGEMENT - Check systemd state on helium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. alexandros kosiaris resyncing back from heze. reimaging did not go well, getting backups back from heze [15:01:02] ACKNOWLEDGEMENT - bacula director process on helium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (bacula), command name bacula-dir alexandros kosiaris resyncing back from heze. reimaging did not go well, getting backups back from heze [15:01:02] ACKNOWLEDGEMENT - bacula sd process on helium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 110 (bacula), command name bacula-sd alexandros kosiaris resyncing back from heze. reimaging did not go well, getting backups back from heze [15:02:01] (03CR) 10Alexandros Kosiaris: [C: 032] "https://puppet-compiler.wmflabs.org/5813/scb1001.eqiad.wmnet/ looks good to me. Merging. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [15:02:07] (03PS11) 10Alexandros Kosiaris: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [15:02:10] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [15:03:11] (03PS3) 10Gehel: wdqs - remove trebuchet based deployment [puppet] - 10https://gerrit.wikimedia.org/r/343302 [15:05:37] (03PS1) 10Jcrespo: Add DROP privileges to testreduce databases to ssastry [puppet] - 10https://gerrit.wikimedia.org/r/343314 (https://phabricator.wikimedia.org/T160691) [15:13:26] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109921 (10Marostegui) >>! In T159266#3109866, @jcrespo wrote: > @Marostegui precisely on that ticket they are discussing when they will be able to decom it, and it i... [15:16:32] (03PS8) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [15:21:48] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109947 (10Ottomata) Given the responses so far, I think we will be able to decom it soon. But, we should wait a while (maybe a week) to collect more feedback to be... [15:21:56] (03CR) 10Marostegui: [C: 031] Add DROP privileges to testreduce databases to ssastry [puppet] - 10https://gerrit.wikimedia.org/r/343314 (https://phabricator.wikimedia.org/T160691) (owner: 10Jcrespo) [15:26:59] 06Operations, 10ops-eqiad, 10DBA: db1047 BBU RAID issues (was: Investigate db1047 replication lag) - https://phabricator.wikimedia.org/T159266#3109954 (10Marostegui) Let's wait then to see how that ticket progress next week or so in order not to make Chris to replace it and then a few days later decom the se... [15:33:11] PROBLEM - Check systemd state on scb1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:37:44] (03PS1) 10Alexandros Kosiaris: service::uwsgi: Remove the log-route directives [puppet] - 10https://gerrit.wikimedia.org/r/343316 [15:40:10] (03CR) 10Alexandros Kosiaris: [C: 032] service::uwsgi: Remove the log-route directives [puppet] - 10https://gerrit.wikimedia.org/r/343316 (owner: 10Alexandros Kosiaris) [15:41:12] (03PS6) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [15:42:11] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:44:10] urandom: still trying to figure out how to please megacli, but the new disk shows up as Failed [15:44:18] that is not what it should be [15:44:39] (something like Firmware state: Unconfigured(bad) would be more familiar to me) [15:45:07] cmjohnson1: hi! Do you have a min for restbase-dev1001? [15:46:55] elukey: you need to make it good [15:46:58] MegaCli -PDMakeGood -PhysDrv[E:S] -aN [15:46:59] Changes drive in state Unconfigured-Bad to Unconfigured-Good. [15:47:26] cmjohnson1: I shows up as Failed though, and trying -PDMakeGood fails [15:47:50] it says "Adapter: 0: Failed to change PD state at EnclId-32 SlotId-2." [15:48:00] (if I got the slot etc.. right) [15:48:58] slots right [15:49:00] (03PS7) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [15:50:16] 11:46 < cmjohnson1> elukey: you need to make it good <-- that made my laugh [15:50:26] until i got the context [15:50:44] megacli has sense of humor [15:50:57] MegaCli -PDMakeGood -PhysDrv[E:S] -aN <-- actually, that made me laugh harder [15:51:20] megacli -PDMakeAwesome [15:53:36] cmjohnson1: megacli -LDInfo -Lall -aALL says "state offline" that looks weird [15:54:24] these machines are using some amazon-purchased disk adapters, aren't they? [15:54:36] i am assuming someone offlined the disks a couple of weeks ago. [15:54:52] urandom no, they're using dell adapters iirc [15:55:01] kk [15:55:23] but it's possible [15:56:01] cmjohnson1: so maybe megacli -PDOnline -PhysDrv [32:2] -a0 ? [15:56:12] never used it [15:56:35] that is how you online it but ... I did that to db1060 last week and jacked it up so you may want to check [15:56:51] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 23487 [15:57:01] it was a mysql thing [15:58:19] tried :) now it is Online spun up [15:58:25] \o/ [16:00:37] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3110062 (10eross) Hi ! @Dzahn OIT can create the account seniori@wikimedia.org but we can't point it seniori@wikimedia.cz. due to the fact all our domain is .org. But if I... [16:02:04] urandom: well it is a bit strange because I expected another outcome, but let's see how it goes [16:02:28] urandom: we may want to review the partman config [16:02:34] because there are disks not used [16:02:44] and the same for disks are used for all the raids [16:03:16] by unused, you mean /dev/md1 ? [16:05:01] RECOVERY - MegaRAID on restbase-dev1001 is OK: OK: optimal, 4 logical, 4 physical [16:05:25] (03CR) 10Jcrespo: [C: 032] Add DROP privileges to testreduce databases to ssastry [puppet] - 10https://gerrit.wikimedia.org/r/343314 (https://phabricator.wikimedia.org/T160691) (owner: 10Jcrespo) [16:05:30] (03PS2) 10Jcrespo: Add DROP privileges to testreduce databases to ssastry [puppet] - 10https://gerrit.wikimedia.org/r/343314 (https://phabricator.wikimedia.org/T160691) [16:05:36] elukey: ^^^ :) [16:05:41] urandom: IIRC those hosts have more than 4 disks [16:06:00] (03PS2) 10Giuseppe Lavagetto: Add flake8 check to tox [switchdc] - 10https://gerrit.wikimedia.org/r/343002 (owner: 10Volans) [16:06:05] i think it's just the 4 now; do you see more? [16:06:09] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add flake8 check to tox [switchdc] - 10https://gerrit.wikimedia.org/r/343002 (owner: 10Volans) [16:06:11] we changed disks [16:06:13] SSDs [16:06:33] mmmm I see Firmware state: Unconfigured(good), Spun Up in 8 slots, but it might be a red herring [16:06:36] okok [16:06:46] hashar: if/when you have a minute could you take a look at https://gerrit.wikimedia.org/r/#/c/343005/ ? [16:06:46] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3110079 (10Urbanecm) If creating full account (not just an alias) is better solution than alias it will fulfill my request. You may send login creds to martin.urbanec@wikime... [16:06:48] hrmm [16:06:48] (03PS1) 10Giuseppe Lavagetto: Switch old tasks to use remote.Remote [switchdc] - 10https://gerrit.wikimedia.org/r/343319 [16:07:37] (03PS3) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:07:42] (03PS3) 10Eevans: Enable cqlsh client encryption [puppet] - 10https://gerrit.wikimedia.org/r/342679 (https://phabricator.wikimedia.org/T111113) [16:08:44] 06Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#2968867 (10faidon) After passing through Finance and Legal approval via Cobblestone, I submitted the APNIC form today. This is now being tracked by APNIC as #3102214 and in progress. [16:08:49] 06Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3110087 (10faidon) a:03faidon [16:09:14] urandom, elukey there are 4 ssds in the slot 0-3 the orginal 2TB disks in the other slots are not in use afaik [16:09:26] aha [16:10:14] ah okok nice! So those were the old rotating disks [16:10:19] now it makes sense [16:10:25] "spinning rust" [16:10:44] urandom: what if I reimage the whole host? It could take less than all this mess [16:10:53] LVM, raid0, other raids to rebuild [16:11:02] ¯\_(ツ)_/¯ [16:11:07] elukey: that would be fine [16:11:11] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:11:15] could we use the rotating rust for booting? [16:11:19] saves ssd space [16:11:39] it doesn't save much [16:11:59] (03PS4) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:12:05] not nearly enough to make an exception out of IMO [16:12:09] 4x~10-20G or so I guess [16:12:41] yeah, wouldn't make sense for a single host only [16:13:12] another downside is that those rotating disks are more likely to fail [16:13:25] (03PS8) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [16:13:41] although.. given how quickly this intel ssd crapped out.. [16:15:44] (03PS4) 10Eevans: Enable cqlsh client encryption [puppet] - 10https://gerrit.wikimedia.org/r/342679 (https://phabricator.wikimedia.org/T111113) [16:16:05] !log reimage restbase-dev1001.eqiad.wmnet [16:16:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:27] (03PS9) 10Volans: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) [16:16:39] (03PS5) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:16:40] elukey: thank you for helping out with this! [16:17:37] urandom: I hope to fix it and not to make it worse :D [16:17:56] elukey: it was out of commission, there is no where to go but up :) [16:18:26] (03PS6) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:18:46] PROBLEM - MariaDB disk space on labsdb1003 is CRITICAL: DISK CRITICAL - free space: /srv 185942 MB (5% inode=99%) [16:19:02] (03PS5) 10Eevans: Enable cqlsh client encryption [puppet] - 10https://gerrit.wikimedia.org/r/342679 (https://phabricator.wikimedia.org/T111113) [16:19:28] (03PS10) 10Giuseppe Lavagetto: Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans) [16:20:01] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add stages to set DB read-only/read-write mode [switchdc] - 10https://gerrit.wikimedia.org/r/343270 (https://phabricator.wikimedia.org/T160178) (owner: 10Volans) [16:20:25] urandom: I'll let you know when it will be completed! [16:20:43] ^ jynus marostegui what's the normal procedure for swattign down temp usage (I assume this is?) on labsdb1003? [16:20:46] RECOVERY - MariaDB disk space on labsdb1003 is OK: DISK OK [16:20:50] huh ok [16:21:00] oh it's right on the edge I see [16:21:10] <_joe_> chasemp: killing the relevant queries usually does it [16:21:34] the important part is to make it look like it was an accident :-) [16:21:46] let me give it a proper look [16:21:56] and see what is taking so much space [16:22:08] jynus: [16:22:13] https://www.irccloud.com/pastebin/MFP9KvXa/ [16:22:21] (03CR) 10Eevans: "Puppet compiler output: http://puppet-compiler.wmflabs.org/5816" [puppet] - 10https://gerrit.wikimedia.org/r/342679 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [16:22:23] (03PS7) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:22:25] sqldata seems to be 412G [16:22:56] yeah, but that doesn't say much, that and tokudb are collectively mysql :-/ [16:22:59] right [16:23:23] that just leaves the log files and tmp/ [16:24:46] swap is over the roof [16:24:51] that server is about to crash [16:25:36] https://grafana-admin.wikimedia.org/dashboard/file/server-board.json?var-server=labsdb1003&var-network=eth0&from=now-24h&to=now [16:26:00] (03PS8) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:26:57] (03PS9) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:27:12] I need to reduce max-execution-time [16:27:41] jynus: I see - it does look like swap usage is climbing, but not the highest the server has seen, even in the last week [16:27:52] I see that [16:27:57] but it shouldn't swap in the first place [16:28:06] ah okay [16:29:17] an old friend of us [16:29:30] is taking 122 GB [16:29:37] s51187__xtools_tmp [16:30:39] jynus: how do you look for user specific usage? [16:30:52] Can any admin help with https://phabricator.wikimedia.org/T151296#2823897 ? I need a regenerated replica.my.cnf file for my shell user so that I can access the database replicas [16:31:03] I don't have tools- just look at /srv/sqldata [16:31:15] and check the subdirs inside with du [16:31:26] the ones that start with sXXXXX__ [16:31:28] are user-created [16:31:44] it could also be temporary tables [16:31:50] mflow: can you ask on #wikimedia-labs? That would be the more relevant channel :) [16:32:06] madhuvishy: alright, thanks, I already did [16:32:11] I am trying to grep them, but lsof may take too much time [16:33:13] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3110170 (10Dzahn) https://support.google.com/a/answer/6297084 ... Initial step: Go to Gmail advanced settings in the Google Admin console From the Admin console dashboard,... [16:34:49] if this was production- even a master- I would reboot mysql here [16:35:07] I just am not sure if it would come back after it :-/ [16:36:19] chasemp, madhuvishy what would you think of doing a semi-emergency maintenance during the night? [16:36:26] 06Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3110176 (10BBlack) [16:36:57] it is either that, or let it explode on its own :-) [16:37:09] jynus: as in now? looks like we have to then :) [16:37:13] jynus, I think do what you gotta do. [16:37:13] no [16:37:16] now now [16:37:32] we can give users some hours and do it in 11 hours or so [16:37:42] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#3110177 (10yuvipanda) The puppet module is still present - although documentaiton has been updated to point people to the p... [16:37:52] when the usage would be at its lowest [16:38:11] (03PS10) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:39:41] 06Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3110186 (10BBlack) [16:39:53] jynus, open to what is best for you and manuel. We are all here now if it's urgent and I would vote to take action as needed [16:40:10] +1 [16:40:24] I do now know, to be honest [16:40:31] I think this is going to crash soon [16:41:01] but it is okish now [16:41:21] RECOVERY - MD RAID on restbase-dev1001 is OK: OK: Active: 12, Working: 12, Failed: 0, Spare: 0 [16:41:23] let's maybe send an email saying we will go to maintenance next week [16:41:37] and do a proper upgrade and all that [16:41:51] (03CR) 10Hashar: [C: 031] Enable mcrypt extension on CI slaves [puppet] - 10https://gerrit.wikimedia.org/r/343223 (owner: 10Ejegg) [16:42:05] jynus: okay. anything we can do temporarily patch for the next 2-3 days? [16:42:15] the patch is done already [16:42:18] ah okay [16:42:24] I restricted query time to 1 hour [16:42:31] to lower the load [16:42:38] but some people may not be too happy about that [16:42:52] I see [16:43:01] the other thing we can do now [16:43:13] is undestand disk usage growth [16:43:31] (03PS2) 10Giuseppe Lavagetto: Add IRC/SAL logging [switchdc] - 10https://gerrit.wikimedia.org/r/343078 (owner: 10Volans) [16:43:45] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add IRC/SAL logging [switchdc] - 10https://gerrit.wikimedia.org/r/343078 (owner: 10Volans) [16:44:11] RECOVERY - Check systemd state on restbase-dev1001 is OK: OK - running: The system is fully operational [16:44:18] urandom: --^ [16:44:26] I mentioned xtools [16:44:34] because that may be a mistake on their part [16:44:36] (03PS11) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:44:41] RECOVERY - cassandra-b service on restbase-dev1001 is OK: OK - cassandra-b is active [16:44:43] (we had some on their side) [16:44:49] let me search the logs [16:44:57] the phab task [16:45:07] elukey: awesome! [16:45:52] madhuvishy, https://phabricator.wikimedia.org/T133321 [16:46:01] urandom: just checked, everything looks good, but if you want to double check it would be great [16:46:21] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:47:25] jynus: right, let me poke them [16:47:31] RECOVERY - cassandra-a service on restbase-dev1001 is OK: OK - cassandra-a is active [16:47:37] (03PS1) 10Yuvipanda: labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) [16:47:44] note my comment here: https://phabricator.wikimedia.org/T133321#2979483 [16:48:37] 06Operations, 10ops-eqiad, 06Services (watching): Degraded RAID on restbase-dev1001 - https://phabricator.wikimedia.org/T157425#3110227 (10elukey) 05Open>03Resolved `megacli -LDInfo -Lall -aALL` showed state `Offline` for the new disk slot so I forced it online with `megacli -PDOnline -PhysDrv [32:2] -a0... [16:49:08] (03PS12) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [16:50:43] (03PS4) 10Paladox: Add mariadb-java-client [debs/gerrit] - 10https://gerrit.wikimedia.org/r/336002 (https://phabricator.wikimedia.org/T145885) [16:51:08] madhuvishy: also can you look at https://gerrit.wikimedia.org/r/343320 when you have the time? no rush :) [16:51:21] RECOVERY - cassandra-a SSL 10.64.0.36:7001 on restbase-dev1001 is OK: SSL OK - Certificate restbase-dev1001-a valid until 2018-01-05 22:53:02 +0000 (expires in 294 days) [16:51:36] elukey: i think everything looks good [16:52:21] yuvipanda: also may be needs build_prefix_depth if you want the prefix to be paws. or something like that [16:52:42] madhuvishy: ah, how do I figure out which number to put there? [16:53:01] urandom: goood [16:53:12] RECOVERY - cassandra-a CQL 10.64.0.36:9042 on restbase-dev1001 is OK: TCP OK - 0.000 second response time on 10.64.0.36 port 9042 [16:53:40] yuvipanda: it builds from exp/project/tools/project/paws/userhomes/* - so if you want paws., and you have custom_prefix paws, prefix depth is 1 [16:54:21] RECOVERY - cassandra-b SSL 10.64.0.37:7001 on restbase-dev1001 is OK: SSL OK - Certificate restbase-dev1001-b valid until 2018-01-05 22:53:03 +0000 (expires in 294 days) [16:54:44] madhuvishy, jynus: I poked musikanimal to see if he can help figure out if the xtools cleanup jobs have broken for some reason. [16:54:45] (03PS5) 10Paladox: Add mariadb-java-client [debs/gerrit] - 10https://gerrit.wikimedia.org/r/336002 (https://phabricator.wikimedia.org/T145885) [16:54:54] yeah looking into it [16:55:02] it could be related to precise deprecation I suppose? [16:55:05] bd808, I susspect maybe [16:55:08] bd808: cool thanks [16:55:09] (03PS5) 10Paladox: Gerrit: Make sure any services under the gerrit2 user are stopped [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 [16:55:09] labsdb1001 [16:55:10] ah, yes it was [16:55:11] (03PS2) 10Yuvipanda: labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) [16:55:11] RECOVERY - cassandra-b CQL 10.64.0.37:9042 on restbase-dev1001 is OK: TCP OK - 0.000 second response time on 10.64.0.37 port 9042 [16:55:12] got cleaned up [16:55:16] madhuvishy: thanks! updated! [16:55:28] but labsdb1003 continued growing? [16:55:30] (03CR) 10Paladox: "> > When uninstalling with the service already working it will fail" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/342082 (owner: 10Paladox) [16:55:35] and it took a bit more to fill up [16:56:10] there are files with 2015 year there, so that would fit [16:56:18] ok I ran the script, it deleted some 20-30 tabels [16:56:20] *tables [16:56:42] musikanimal, are you sure you are using the right server? [16:56:52] elukey: thanks again, this was really helpful! [16:56:58] because there are files on the other server too (I think it is where wikidata is) [16:57:03] musikanimal: yeah labsdb1003 is what we are currently talking about [16:57:09] hmm ok [16:57:10] urandom: yw! [16:57:45] and I saw no decrease in space used there [16:58:32] (03PS3) 10Madhuvishy: labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) (owner: 10Yuvipanda) [16:58:40] (03CR) 10Madhuvishy: [C: 032] labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) (owner: 10Yuvipanda) [16:58:44] the other database with more space used than wikis is u3532__ [16:59:53] yuvipanda: looks good! merging and running puppet on labstore1005 - should pick it up and log [16:59:53] they look like 1:1 copies of wikis [17:00:33] \o/ ty [17:02:24] yuvipanda: I only +2-ed, haven't merged, jfyi [17:03:07] so what database on labsdb1003 is related to xtools? [17:03:24] wikidata, for example [17:03:38] you can do SELECT @@hostname to be 100% sure [17:04:11] oh, you mean your database? [17:04:20] yeah [17:04:33] s51187__xtools_tmp [17:04:34] these looks like the replicas [17:04:40] *look [17:04:57] 06Operations, 10Domains, 10Traffic, 13Patch-For-Review: Using wikimedia.ee mail address as Google account - https://phabricator.wikimedia.org/T158638#3110288 (10Dzahn) @Kaarel_Vaidla @Beetlebeard I can make the change but i wanted to check with you one last time that it is ok instead of just changing it a... [17:04:59] oh I see [17:05:19] so there was one on labsdb1001, and that is ok, I think, and there is another on labsdb1003 [17:06:06] "sql wikidatawiki" and then "use s51187__xtools_tmp" should get you there [17:06:07] boom [17:06:33] hope that is a good "boom" [17:06:35] I changed the host on the cleanup script then re-ran it [17:06:38] yeah haha [17:06:39] :-) [17:06:50] down to 2.6G now [17:06:58] thanks musikanimal :) [17:06:59] wow, thank you very much! [17:07:02] going to make the script do both hosts moving forward [17:07:09] \o/ [17:07:10] thanks a lot! [17:07:13] np, sorry this keeps happening! [17:07:24] look, you are maintaning it [17:07:29] that is the only thing we care [17:07:40] as long as there is someone there taking care of it [17:07:51] we do not care about he occational hiccups :-) [17:07:58] for the record, Matthew_ and Community Tech are rebuilding all of XTools from the ground up, and we should have issues like this in the new version [17:08:11] *shouldn't [17:08:16] thanks [17:08:23] awesome! [17:08:32] so the old crappy one will be laid to rest before too long [17:08:49] musikanimal: Hey! [17:08:55] hi! [17:08:55] Was my cleanup script busted? [17:09:41] well, 1) we didn't update the cron to run on trusty and 2) it didn't purge the tables in labsdb1003 in addition to s1.labsdb [17:09:52] no worries though, easy fix [17:10:01] Okay, cool. [17:10:32] 06Operations, 06Discovery, 06Labs, 03Interactive-Sprint, 06Maps (Maps-data): PostgreSQL query planner bug on labsdb1006 - https://phabricator.wikimedia.org/T145599#3110325 (10MaxSem) 05Open>03Resolved a:03MaxSem Works after the servers were upgraded to PG 9.4. [17:11:12] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3110330 (10MaxSem) The new server works for me. The upgrade also resolved T145599. Thank you! [17:11:34] 06Operations, 06Discovery, 06Labs, 03Interactive-Sprint, 06Maps (Maps-data): PostgreSQL query planner bug on labsdb1006 - https://phabricator.wikimedia.org/T145599#2635546 (10jcrespo) \o/ [17:12:16] (03CR) 10Dzahn: [C: 031] "this looks alright, but also needs checking on the nodepool instances. i talked briefly to hashar who pointed out they are refreshed once" [puppet] - 10https://gerrit.wikimedia.org/r/343223 (owner: 10Ejegg) [17:13:23] (03CR) 10Hashar: [C: 031] "There is a slight chance that enable the mcrypt extension ends up triggering fault in MediaWiki related jobs. Though I haven't looked at" [puppet] - 10https://gerrit.wikimedia.org/r/343223 (owner: 10Ejegg) [17:15:21] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [17:15:47] jynus: let us know when you'd like to schedule maintenance and if we should send out an email [17:16:04] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3110364 (10Papaul) Service Request Information: Dispatch Information: Customer Information: Dispatch Number: 324983627 Service Tag: FXLPND2 Service Request Numbe... [17:16:12] yeah, let me finish a couple of things, and I will send an email to the labs admins [17:22:50] jynus: okay, thank you :) [17:39:55] (03PS9) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [17:39:57] (03PS2) 10Giuseppe Lavagetto: Switch old tasks to use remote.Remote [switchdc] - 10https://gerrit.wikimedia.org/r/343319 [17:43:21] 06Operations, 10Domains, 10Traffic, 06WMF-Legal, 13Patch-For-Review: Using wikimedia.ee mail address as Google account - https://phabricator.wikimedia.org/T158638#3110446 (10Dzahn) [17:57:21] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [17:58:21] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:01:48] (03CR) 10Volans: [C: 04-1] "Few minor things, see inline" (035 comments) [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [18:03:13] random wikipedia user says: [18:03:13] Your JavaScript is slowing down your page time which leads to less user friendly module At [18:03:16] https://www.wikipedia.org/portal/wikipedia.org/assets/js/index-d1cc91a7f4.js [18:03:22] but that file does not even exist? [18:03:44] mutante, I think they change the name of that to avoid caching issues [18:03:52] ah, that makes sense [18:04:11] tell user to put ticket to discovery [18:04:15] i wonder if i should make a ticket for that, they just said the above and mailed dns-admin [18:04:19] ok [18:04:25] they are I think actucally working on improving the portal [18:04:36] *nod*, thanks [18:04:57] the portal pages, right [18:05:33] there maybe a ticket already [18:05:37] let me search [18:06:03] see https://wikitech.wikimedia.org/wiki/Incident_documentation/20170222-www-portals#Actionables [18:06:46] cool, i remember that incident. thx [18:07:21] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:08:47] ~1200 lines of unminified JS for the portal? :P [18:10:21] bblack, on the other side -if I am being dumb and not really mean it- who uses wikipedia.org? :-) [18:11:31] I'll take the point of that unminified JS... [18:12:02] jan_drewniak: above discussion might interest you... [18:25:27] (03CR) 10Smalyshev: "looks ok to me but I fear I have not enough knowledge to see if it's correct or not. Can we test it somehow?" [puppet] - 10https://gerrit.wikimedia.org/r/343302 (owner: 10Gehel) [18:26:22] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:27:31] (03CR) 10Gehel: "I have a test vm (wd-deploy2) to test on labs (but without scap3 enabled). I'll do a dry-run when deploying on prod to ensure nothing chan" [puppet] - 10https://gerrit.wikimedia.org/r/343302 (owner: 10Gehel) [18:28:32] jynus: you'd be surprised at how much traffic arrives on wikimedia.org... [18:29:02] but I supposed everyone used bing! [18:29:34] :-) [18:30:15] * gehel isn't sure why people are going to wikimedia.org [18:30:39] we tried to do a survey to understand, but I'm not sure what the conclusion were... [18:31:47] mutante: I just checked, that JS file is actually minified (or someone is writing some really ugly JS) [18:32:35] do people search wikipedia on wikipedia.org ? [18:32:46] they actually do! [18:33:01] you should ask debt for the whole story... [18:34:41] According to Wikipedia, Wikipedia receives "between 25,000 and 60,000 page requests per second" [18:34:53] "page requests are first passed to a front-end layer of Squid caching servers." [18:35:29] * jynus tempting bblack to edit [18:41:50] jynus: I actually planned to fix that soon [18:42:05] that information is super old.. [18:49:11] Hi jynus - take a look at our Wikipedia portal dashboard - we totally get lots of visitors! :) http://discovery.wmflabs.org/portal/#pageview_tab [18:59:18] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3110616 (10GLCiampaglia) I am pretty sure my tables can be safely deleted. Thanks for the heads up! Giovanni [18:59:21] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:02:15] (03PS4) 10Yuvipanda: labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) [19:15:31] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:27:21] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:44:31] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:50:57] (03CR) 10Hashar: [C: 031] "On second thought: that seems to be for the job wikimedia-fundraising-civicm which is still on the permanent slave. So one can cherry pi" [puppet] - 10https://gerrit.wikimedia.org/r/343223 (owner: 10Ejegg) [19:56:31] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:06:51] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:09:41] PROBLEM - puppet last run on thumbor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:09:52] (03PS9) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [20:15:51] PROBLEM - restbase endpoints health on restbase-dev1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.35, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7f6dd953d950: Failed to establish a new connection: [Errno 111] Connection refused,)) [20:16:01] PROBLEM - Restbase root url on restbase-dev1001 is CRITICAL: connect to address 10.64.0.35 and port 7231: Connection refused [20:21:59] 06Operations, 10Analytics, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#3110845 (10Nuria) Design document available in meta: https://meta.wikimedia.org/wiki/Research:PrivacyConsciousABTestingAtWikimediaFoundation [20:22:42] 06Operations, 06Analytics-Kanban, 06Performance-Team, 06Reading-Admin, 10Traffic: Preliminary Design document for A/B testing - https://phabricator.wikimedia.org/T143694#3110846 (10Nuria) Design document available in beta: https://meta.wikimedia.org/wiki/Research:PrivacyConsciousABTestingAtWikimediaFound... [20:22:52] (03PS1) 10RobH: decom ms-fe2001 through ms-fe2004 [puppet] - 10https://gerrit.wikimedia.org/r/343331 [20:23:45] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3110852 (10RobH) [20:24:31] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [20:27:48] 06Operations, 10Analytics-Cluster, 06Analytics-Kanban: Reinstall Analytics Hadoop Cluster with Debian Jessie - https://phabricator.wikimedia.org/T157807#3110867 (10Nuria) [20:27:51] 06Operations, 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review, 15User-Elukey: Reimage a Trusty Hadoop worker to Debian jessie - https://phabricator.wikimedia.org/T159530#3110866 (10Nuria) 05Open>03Resolved [20:35:51] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [20:37:41] RECOVERY - puppet last run on thumbor1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [20:40:42] (03CR) 10RobH: [C: 032] decom ms-fe2001 through ms-fe2004 [puppet] - 10https://gerrit.wikimedia.org/r/343331 (owner: 10RobH) [20:40:45] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3110890 (10RobH) [20:46:46] (03CR) 10Andrew Bogott: [C: 031] Harmomise group type for LDAP admin access [puppet] - 10https://gerrit.wikimedia.org/r/342008 (https://phabricator.wikimedia.org/T157131) (owner: 10Muehlenhoff) [20:48:55] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3110916 (10RobH) [20:50:50] (03PS1) 10RobH: decom of ms-fe200[1-4] [dns] - 10https://gerrit.wikimedia.org/r/343334 [20:51:31] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3110932 (10RobH) [20:52:04] (03CR) 10RobH: [C: 032] decom of ms-fe200[1-4] [dns] - 10https://gerrit.wikimedia.org/r/343334 (owner: 10RobH) [20:53:43] 06Operations, 10ops-codfw, 10hardware-requests: Decomission ms-fe2001-4 - https://phabricator.wikimedia.org/T159413#3110935 (10RobH) a:05RobH>03Papaul Assigning to @papaul for the remainder of the steps. If these are HDDs, please wipe. If SSDs, we'll need to investigate using new SSD trim support wipe... [20:59:01] Hey, I has a quick question that someone here might know the answer to.... [21:00:11] Using the API for ‘transcode reset’ (TimedMediaHandler) does not work using a ‘botpassword’, you has to login as the actual account, which seems odd but meh. [21:00:47] I’m just really, REALLY hoping (since it’s not actually an edit) that blocked accounts can’t use it. [21:02:14] (hrms) I guess I could test it. [21:03:49] Ouch. [21:03:55] (03CR) 10Dzahn: "> This should be gerrit/server.yaml?" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [21:04:28] (03CR) 10Dzahn: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [21:05:36] (03CR) 10Dzahn: "ok, got the point about having to reconfigure labs instances. let me see if we can keep the role name the same while doing the rest of it." [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [21:07:20] brion: I know it’s late, but ping real quick? [21:07:32] Just want to mention something. [21:07:57] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3111002 (10eross) Thank you @Dzahn ! @Urbanecm I was able to create the account; I sent a couple of test emails including my personal email for the reroute and it was succe... [21:08:09] Yo [21:08:45] Hah. That seems like something to fix [21:08:50] brion: ^ what I just commented about, I tested it (by blocking my ‘pet’ account) and it works… seems… suboptimal. [21:09:30] Revent: can you drop me a mail to remind me to patch that? Should be easy fix [21:09:36] Bvibber@wikimedia.org [21:09:42] Sure. [21:09:44] Thanks [21:10:24] Too many inboxes ;) [21:12:39] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3097403 (10Peachey88) >>! In T160400#3110170, @Dzahn wrote: > https://support.google.com/a/answer/6297084 > ... > Initial step: Go to Gmail advanced settings in the Google... [21:13:40] brion: FYI, I’m pushing the ‘unitialized’ transcodes on Commons (that showed up after purging all those pages) back through in chunks. [21:19:33] Ok, that should keep servers busy for a while longer. :) [21:19:42] And... looks like the low res backfills are done, good [21:27:00] (03PS3) 10Andrew Bogott: Bootstrapvz: Simplify and update [puppet] - 10https://gerrit.wikimedia.org/r/343208 [21:27:02] (03PS4) 10Andrew Bogott: Keystonehooks: Exclude 'novaobserver' user from posix user group. [puppet] - 10https://gerrit.wikimedia.org/r/343074 (https://phabricator.wikimedia.org/T158650) [21:27:04] (03PS1) 10Andrew Bogott: Designate: Don't use keystone to resolve project id [puppet] - 10https://gerrit.wikimedia.org/r/343356 (https://phabricator.wikimedia.org/T158650) [21:28:27] (03CR) 10jerkins-bot: [V: 04-1] Designate: Don't use keystone to resolve project id [puppet] - 10https://gerrit.wikimedia.org/r/343356 (https://phabricator.wikimedia.org/T158650) (owner: 10Andrew Bogott) [21:29:43] (03PS15) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [22:00:58] (03CR) 10Milimetric: [C: 031] "ottomata, this is safe to merge" [puppet] - 10https://gerrit.wikimedia.org/r/343246 (owner: 10Catrope) [22:02:45] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3111179 (10Dzahn) nice that it worked, and thank you very much for taking it [22:03:37] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:04:40] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3097403 (10Dzahn) a:03eross [22:13:07] (03PS16) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [22:22:23] (03CR) 10Yuvipanda: [V: 032 C: 032] labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) (owner: 10Yuvipanda) [22:22:31] (03PS5) 10Yuvipanda: labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) [22:22:36] (03CR) 10Yuvipanda: [V: 032 C: 032] labstore: Track PAWS user storage too with Diamond [puppet] - 10https://gerrit.wikimedia.org/r/343320 (https://phabricator.wikimedia.org/T160114) (owner: 10Yuvipanda) [22:27:37] (03PS17) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [22:31:37] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:32:55] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:34:26] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:36:03] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:37:36] (03CR) 10Paladox: gerrit: convert to profile/role structure (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:40:46] Did someone break puppet [22:40:48] ? [22:40:49] root@puppet-paladox:/var/lib/git/operations/puppet# sudo su git puppet [22:40:58] root@puppet-paladox:/var/lib/git/operations/puppet# sudo su git puppet [22:40:58] /usr/bin/puppet: line 5: .delete: command not found [22:40:58] /usr/bin/puppet: line 7: require: command not found [22:40:58] /usr/bin/puppet: line 8: Puppet::Util::CommandLine.new.execute: command not found [22:41:01] mutante ^^ [22:41:20] never mind [22:41:34] seems i accidentily added a whitepsace [22:46:10] paladox: when already root you don't need "sudo su" or even "su" [22:46:23] git puppet is an alias? [22:46:33] ah, ok [22:47:09] one more PS ... [22:47:36] oh [22:47:40] (03PS18) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [22:48:54] Again… just an a ‘note’ (I mentioned this earlier)… [22:51:58] I’m pushing the ‘uninitialized’ transcodes on Commons (ones that were never transcoded) though in chunks… there were about 40k, so it’s a lot of chunks. It’s going to cause the video scalers to peg high on load… I’m going to attempt to (though it’s hard to judge) push them through in small enough chunks that the servers can ‘catch up’ every few hours, so that new uploads can run. [22:52:28] greg-g: Is it OK if I get RoanKattouw to do an emergency deploy for VE? UBN bug I really don't want to interfere with users' editing over the weekend. :-( [22:52:52] Context: this was already believed to be deployed, but someone forgot to run git submodule update with --recursive [22:53:11] *some of this [22:53:11] RoanKattouw: Half of it. [22:53:24] James_F: Did you write buggy code? Shame on you. [22:53:26] :P [22:53:48] Revent: Not me. I mostly harangued people into finding the fixes. :-) [22:54:13] James_F: BTW, pm? [22:54:36] Mostly so you can make sure brion remembers... [22:54:48] Sure [22:58:18] (03PS10) 10BBlack: [POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [22:58:24] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3111341 (10Tbayer) I have been using `db1047` quite frequently for EventLogging queries as an alternative to `dbstore1002`, either because it was (at times) much faster, or in... [22:59:57] (03CR) 10BBlack: "PS10 is hypothetically feature-complete, but it's still completely "blind" work and untested at this point." [puppet] - 10https://gerrit.wikimedia.org/r/342887 (owner: 10BBlack) [23:01:15] James_F: task? [23:01:37] I see two VE UBN [23:02:03] greg-g: https://phabricator.wikimedia.org/T160479 https://phabricator.wikimedia.org/T160190 https://phabricator.wikimedia.org/T154123 https://phabricator.wikimedia.org/T160197 [23:02:09] heh [23:02:12] OK [23:04:18] (03PS19) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:06:58] (03CR) 10Chad: [C: 04-1] "I'm sounding like a broken record. Any rewrite rules is a non-starter imho. They broke it, they can fix it." [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120) (owner: 10Paladox) [23:08:14] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [23:13:53] (03CR) 10Chad: Fix some Debian lintian warnnings for the gerrit package (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 (owner: 10Paladox) [23:15:20] (03PS13) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [23:15:43] (03CR) 10Paladox: Fix some Debian lintian warnnings for the gerrit package (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 (owner: 10Paladox) [23:16:01] (03PS14) 10Paladox: Fix some Debian lintian warnnings for the gerrit package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/343297 [23:18:13] (03PS20) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:22:50] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [23:24:17] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [23:24:17] PROBLEM - Check systemd state on kafka1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:28:17] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [23:28:17] RECOVERY - Check systemd state on kafka1012 is OK: OK - running: The system is fully operational [23:30:59] !log lists: making Steinsplitter and Zhuyifei1999 list admins of commons-poty (T160672) [23:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:06] T160672: Assign new listadmin for commons-poty - https://phabricator.wikimedia.org/T160672 [23:36:35] !log catrope@tin Synchronized php-1.29.0-wmf.16/extensions/VisualEditor/lib/ve: Fixes for T154123 T160479 T160190 T160197 (duration: 00m 42s) [23:36:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:43] T154123: VisualEditor: when pasting wikified text from an article, the text style (including link) can't be modified - https://phabricator.wikimedia.org/T154123 [23:36:43] T160197: [Regression wmf.16] Toolbar is not floating as I scroll down in VE, Error in the console "Uncaught TypeError: Cannot read property 'center' of undefined" - https://phabricator.wikimedia.org/T160197 [23:36:43] T160190: [Regression wmf.16] Context menus for items below the screen appear broken (no vertical height) - https://phabricator.wikimedia.org/T160190 [23:36:43] T160479: [Regression wmf.16] Cursor jumps to the beginning of the page after adding a focusable node - https://phabricator.wikimedia.org/T160479 [23:37:44] greg-g: All done. Thank you. [23:45:57] (03PS21) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [23:48:30] !log lists: creating new list wikispecies-admin (T159625) [23:48:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:36] T159625: Wikispecies-Admins , a mail list for admins within Wikispecies - https://phabricator.wikimedia.org/T159625 [23:54:27] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:58:58] (03CR) 10Dzahn: [C: 031] "finally no-op for realz http://puppet-compiler.wmflabs.org/5823/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn)