[00:00:04] RoanKattouw ostriches Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160107T0000). [00:02:14] PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail [00:09:29] empty slot, just as well [00:18:24] PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: puppet fail [00:22:53] (03CR) 10DCausse: [C: 031] [elasticsearch] Record per-node fetch timing [puppet] - 10https://gerrit.wikimedia.org/r/262828 (owner: 10EBernhardson) [00:27:04] RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:27:07] (03PS2) 10Rush: [elasticsearch] Record per-node fetch timing [puppet] - 10https://gerrit.wikimedia.org/r/262828 (owner: 10EBernhardson) [00:32:38] (03PS1) 10MaxSem: Add my new key. Verified with Jaime in person [puppet] - 10https://gerrit.wikimedia.org/r/262843 [00:36:53] (03CR) 10Rush: [C: 032] [elasticsearch] Record per-node fetch timing [puppet] - 10https://gerrit.wikimedia.org/r/262828 (owner: 10EBernhardson) [00:37:26] (03CR) 10Jcrespo: [C: 031] Add my new key. Verified with Jaime in person [puppet] - 10https://gerrit.wikimedia.org/r/262843 (owner: 10MaxSem) [00:37:53] (03PS2) 10Jcrespo: Add my new key. Verified with Jaime in person [puppet] - 10https://gerrit.wikimedia.org/r/262843 (owner: 10MaxSem) [00:38:56] (03CR) 10Jcrespo: [C: 032] Add my new key. Verified with Jaime in person [puppet] - 10https://gerrit.wikimedia.org/r/262843 (owner: 10MaxSem) [00:45:25] RECOVERY - puppet last run on ganeti2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:50:23] (03PS1) 10Ori.livneh: Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) [00:51:17] (03CR) 10jenkins-bot: [V: 04-1] Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) (owner: 10Ori.livneh) [00:56:39] (03PS2) 10Ori.livneh: Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) [00:57:42] (03CR) 10jenkins-bot: [V: 04-1] Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) (owner: 10Ori.livneh) [00:58:27] (03PS3) 10Ori.livneh: Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) [00:59:22] (03CR) 10jenkins-bot: [V: 04-1] Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) (owner: 10Ori.livneh) [00:59:46] (03PS4) 10Ori.livneh: Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) [01:02:17] (03PS5) 10Ori.livneh: Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) [01:06:01] (03CR) 10Ori.livneh: [C: 032] Add testreduce module and role [puppet] - 10https://gerrit.wikimedia.org/r/262846 (https://phabricator.wikimedia.org/T118778) (owner: 10Ori.livneh) [01:09:54] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet has 1 failures [01:11:07] (03PS1) 10Yuvipanda: Use wikimedia debian mirror [docker-images/debian] - 10https://gerrit.wikimedia.org/r/262848 [01:17:04] (03CR) 10Yuvipanda: [C: 032 V: 032] Use wikimedia debian mirror [docker-images/debian] - 10https://gerrit.wikimedia.org/r/262848 (owner: 10Yuvipanda) [01:18:45] PROBLEM - DPKG on ruthenium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [01:20:51] 6operations, 10MediaWiki-Maintenance-scripts: WikiExporter does not respect groupLoadsByDB[$wiki]['dump'] - https://phabricator.wikimedia.org/T43668#1919390 (10ArielGlenn) BackupDumper->backupDb() in maintenance/backup.inc gets the right db based on the dumps group, and passes this to WikiExporter. This has b... [01:22:28] (03PS1) 10Ori.livneh: testreduce: declare log directory [puppet] - 10https://gerrit.wikimedia.org/r/262851 [01:22:41] (03CR) 10Ori.livneh: [C: 032 V: 032] testreduce: declare log directory [puppet] - 10https://gerrit.wikimedia.org/r/262851 (owner: 10Ori.livneh) [01:23:44] 6operations, 10MediaWiki-Maintenance-scripts: WikiExporter does not respect groupLoadsByDB[$wiki]['dump'] - https://phabricator.wikimedia.org/T43668#1919395 (10jcrespo) 5Open>3Resolved a:3jcrespo Not relevant anymore. [01:24:15] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [01:24:36] (03CR) 10Jforrester: [C: 04-1] "I very strongly object to making the wiki worse in this way." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup) [01:27:38] (03PS1) 10Amire80: Update the ssh key for amire80 [puppet] - 10https://gerrit.wikimedia.org/r/262852 [01:28:12] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM. Amir is sitting right next to me while generating this" [puppet] - 10https://gerrit.wikimedia.org/r/262852 (owner: 10Amire80) [01:28:19] (03PS2) 10Alexandros Kosiaris: Update the ssh key for amire80 [puppet] - 10https://gerrit.wikimedia.org/r/262852 (owner: 10Amire80) [01:28:24] (03CR) 10Alexandros Kosiaris: [V: 032] Update the ssh key for amire80 [puppet] - 10https://gerrit.wikimedia.org/r/262852 (owner: 10Amire80) [01:31:54] (03PS1) 10Alexandros Kosiaris: Fix typo in amire80's new key [puppet] - 10https://gerrit.wikimedia.org/r/262854 [01:32:51] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix typo in amire80's new key [puppet] - 10https://gerrit.wikimedia.org/r/262854 (owner: 10Alexandros Kosiaris) [01:40:02] (03CR) 10Nikerabbit: "Could you elaborate what do you mean with that?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/214893 (https://phabricator.wikimedia.org/T100313) (owner: 10Ladsgroup) [01:50:50] 10Ops-Access-Requests, 6operations, 10Analytics, 10ContentTranslation-Analytics, and 2 others: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#1919427 (10Amire80) It works for me in stat1003 (and thanks to @akosiaris for some extra help with ssh configuration). @Nuri... [02:25:11] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 33s) [02:25:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:54] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Puppet has 1 failures [02:32:04] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jan 7 02:32:04 UTC 2016 (duration 6m 54s) [02:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:46:14] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [02:46:48] (03PS1) 10Andrew Bogott: WIP: Send email to project admins when puppet fails. [puppet] - 10https://gerrit.wikimedia.org/r/262856 (https://phabricator.wikimedia.org/T121773) [02:48:13] (03CR) 10jenkins-bot: [V: 04-1] WIP: Send email to project admins when puppet fails. [puppet] - 10https://gerrit.wikimedia.org/r/262856 (https://phabricator.wikimedia.org/T121773) (owner: 10Andrew Bogott) [02:48:15] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [02:54:54] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [03:21:34] (03CR) 10Jforrester: [C: 04-2] "Blocked until access to this functionality is logged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/138684 (https://phabricator.wikimedia.org/T68450) (owner: 10Legoktm) [03:22:02] (03CR) 10Jforrester: "Is this still needed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/251137 (owner: 10Ori.livneh) [03:22:29] (03PS2) 10Jforrester: Remove proxyunbannable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254842 (https://phabricator.wikimedia.org/T75414) (owner: 10Cenarium) [03:25:52] (03PS4) 10Jforrester: Add portal namespace to ps.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255519 (https://phabricator.wikimedia.org/T119510) (owner: 10Mdann52) [03:26:19] (03PS5) 10Jforrester: Namespace config change on de.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255361 (https://phabricator.wikimedia.org/T119420) (owner: 10Mdann52) [03:26:38] (03PS8) 10Jforrester: Enable new user groups on gu.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255810 (https://phabricator.wikimedia.org/T119787) (owner: 10Mdann52) [03:27:11] (03CR) 10Jforrester: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240065 (https://phabricator.wikimedia.org/T104251) (owner: 10Mdann52) [03:28:41] (03PS2) 10Jforrester: Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts) [03:28:48] (03CR) 10Jforrester: [C: 031] Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts) [03:29:06] (03Abandoned) 10Legoktm: Set $wgTitleBlacklistLogHits = true on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/138684 (https://phabricator.wikimedia.org/T68450) (owner: 10Legoktm) [03:29:42] (03CR) 10Jforrester: [C: 031] Add enwiki as transwiki import source for ta.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262352 (https://phabricator.wikimedia.org/T122808) (owner: 10Shanmugamp7) [03:29:56] (03CR) 10Jforrester: [C: 031] Added noindex rule for uawikimedia's user namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261902 (https://phabricator.wikimedia.org/T122732) (owner: 10Base) [03:30:17] (03CR) 10Jforrester: [C: 031] Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) (owner: 10Luke081515) [03:30:29] (03CR) 10Jforrester: [C: 031] Enable global AbuseFilter at French Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/257868 (https://phabricator.wikimedia.org/T120568) (owner: 10Glaisher) [03:30:42] (03CR) 10Jforrester: [C: 031] dewikibooks: Set $wgRestrictDisplayTitle to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260964 (https://phabricator.wikimedia.org/T122433) (owner: 10Luke081515) [03:30:54] (03CR) 10Jforrester: [C: 031] Enable new user groups on gu.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255810 (https://phabricator.wikimedia.org/T119787) (owner: 10Mdann52) [03:31:03] (03CR) 10Jforrester: [C: 031] Tidy robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240065 (https://phabricator.wikimedia.org/T104251) (owner: 10Mdann52) [03:31:12] (03CR) 10Jforrester: [C: 031] Set wgLocaltimezone for orwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260745 (https://phabricator.wikimedia.org/T122273) (owner: 10Reedy) [03:31:20] (03CR) 10Jforrester: [C: 031] Remove wgArticlePath from InitialiseSettings as it's in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260242 (owner: 10Reedy) [03:31:29] (03CR) 10Jforrester: [C: 031] Exempt private/fishbowl wikis from the global title blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244140 (https://phabricator.wikimedia.org/T114873) (owner: 10TTO) [03:31:38] (03CR) 10Jforrester: [C: 031] Namespace config change on de.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255361 (https://phabricator.wikimedia.org/T119420) (owner: 10Mdann52) [03:31:49] (03CR) 10Jforrester: [C: 031] Template editor group on hi.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258444 (https://phabricator.wikimedia.org/T120342) (owner: 10Dereckson) [03:31:59] (03CR) 10Jforrester: [C: 031] Get rid of $wmg hack for MassMessage settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237686 (owner: 10Legoktm) [03:32:10] (03CR) 10Jforrester: [C: 031] Enable interface-editor group at urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258453 (https://phabricator.wikimedia.org/T120348) (owner: 10Luke081515) [03:32:19] (03CR) 10Jforrester: [C: 031] Enable NewUserMessage on ps.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258672 (https://phabricator.wikimedia.org/T121132) (owner: 10Dereckson) [03:32:29] (03CR) 10Jforrester: [C: 031] Set site name on sr.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258670 (https://phabricator.wikimedia.org/T121278) (owner: 10Dereckson) [03:32:44] (03CR) 10Jforrester: [C: 031] Enable flood group at lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258477 (https://phabricator.wikimedia.org/T121238) (owner: 10Luke081515) [03:32:54] (03CR) 10Jforrester: [C: 031] Allow sysop to grant and revoke transwiki on gu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258474 (https://phabricator.wikimedia.org/T120346) (owner: 10Dereckson) [03:33:02] (03CR) 10Jforrester: [C: 031] Namespace configuration on my.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258442 (https://phabricator.wikimedia.org/T119807) (owner: 10Dereckson) [03:33:10] (03CR) 10Jforrester: [C: 031] Import sources on gu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258441 (https://phabricator.wikimedia.org/T120346) (owner: 10Dereckson) [03:33:17] (03CR) 10Jforrester: [C: 031] Namespace configuration on pa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/258436 (https://phabricator.wikimedia.org/T120936) (owner: 10Dereckson) [03:33:24] (03CR) 10Jforrester: [C: 031] Remove proxyunbannable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/254842 (https://phabricator.wikimedia.org/T75414) (owner: 10Cenarium) [03:33:38] (03CR) 10Jforrester: [C: 031] Add portal namespace to ps.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255519 (https://phabricator.wikimedia.org/T119510) (owner: 10Mdann52) [03:33:54] (03CR) 10Jforrester: [C: 031] Get rid of old unused $wgAllowed* variables [mediawiki-config] - 10https://gerrit.wikimedia.org/r/256853 (https://phabricator.wikimedia.org/T50493) (owner: 10Alex Monk) [04:05:15] PROBLEM - Last backup of the maps filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-maps was exit-code [05:03:30] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919604 (10RobH) [05:03:43] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919605 (10RobH) p:5Triage>3Normal [06:04:02] (03PS2) 10Andrew Bogott: WIP: Send email to project admins when puppet fails. [puppet] - 10https://gerrit.wikimedia.org/r/262856 (https://phabricator.wikimedia.org/T121773) [06:30:34] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: puppet fail [06:31:34] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:34] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:35] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:36] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:16] (03CR) 10Ori.livneh: "Yes; I'll get to it eventually." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/251137 (owner: 10Ori.livneh) [06:56:24] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:25] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:56:34] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:35] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:35] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:14] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:31:45] (03PS5) 10Alexandros Kosiaris: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 [07:32:38] (03CR) 10jenkins-bot: [V: 04-1] Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [07:35:46] (03PS6) 10Alexandros Kosiaris: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 [08:04:54] (03PS7) 10Alexandros Kosiaris: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 [08:07:26] (03CR) 10Alexandros Kosiaris: "I think this is ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [08:13:34] 6operations, 10fundraising-tech-ops, 5Patch-For-Review: remove fundraising banner log related cruft from production puppet - https://phabricator.wikimedia.org/T118325#1919688 (10akosiaris) @JGreen, I think all the blockers are done, should we move with the cleanup ? [08:18:09] 6operations, 10Fundraising-Backlog, 6Security, 10fundraising-tech-ops: Delete gadolinium:/a/log/fundraising/ - https://phabricator.wikimedia.org/T92336#1919690 (10akosiaris) According to https://phabricator.wikimedia.org/rOPUP6ce3f053b74cd1fe2525f4550a3f81753daa8277, gadolinium as well as erbium are now in... [08:18:54] 7Blocked-on-Operations, 6operations, 10ops-eqiad: reclaim erbium, gadolinium into spares - https://phabricator.wikimedia.org/T123029#1919692 (10akosiaris) 3NEW [09:31:44] PROBLEM - very high load average likely xfs on ms-be1013 is CRITICAL: CRITICAL - load average: 229.62, 154.14, 76.00 [09:47:54] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 1 failures [09:55:14] PROBLEM - salt-minion processes on bohrium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [10:10:13] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 714 [10:14:54] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:18:24] PROBLEM - swift-account-server on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:33] PROBLEM - DPKG on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:33] PROBLEM - swift-object-updater on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:43] PROBLEM - swift-container-auditor on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:45] PROBLEM - swift-account-auditor on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:53] PROBLEM - swift-object-auditor on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:18:54] PROBLEM - swift-container-replicator on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:04] PROBLEM - swift-account-reaper on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:14] PROBLEM - swift-container-server on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:23] PROBLEM - SSH on ms-be1013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:19:34] PROBLEM - swift-container-updater on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:45] PROBLEM - puppet last run on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:53] PROBLEM - swift-object-replicator on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:54] PROBLEM - RAID on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:54] PROBLEM - swift-object-server on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:20:03] PROBLEM - Check size of conntrack table on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:20:13] RECOVERY - check_mysql on db1008 is OK: Uptime: 1446201 Threads: 3 Questions: 41532404 Slow queries: 16133 Opens: 59853 Flush tables: 2 Open tables: 415 Queries per second avg: 28.718 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:20:13] PROBLEM - swift-account-replicator on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:20:14] PROBLEM - dhclient process on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:20:15] PROBLEM - salt-minion processes on ms-be1013 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:22:13] RECOVERY - salt-minion processes on bohrium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:33:54] PROBLEM - Last backup of the tools filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-tools was exit-code [10:35:13] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 814 [10:40:13] RECOVERY - check_mysql on db1008 is OK: Uptime: 1447402 Threads: 2 Questions: 41542819 Slow queries: 16152 Opens: 59854 Flush tables: 2 Open tables: 416 Queries per second avg: 28.701 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:41:33] (03CR) 10Tim Landscheidt: "Can Labs projects with the default setup send outgoing mail?" [puppet] - 10https://gerrit.wikimedia.org/r/262856 (https://phabricator.wikimedia.org/T121773) (owner: 10Andrew Bogott) [10:54:23] PROBLEM - NTP on ms-be1013 is CRITICAL: NTP CRITICAL: No response from NTP server [11:06:14] (03PS1) 10Yuvipanda: tools: Add misctools package to all exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/262882 [11:08:40] (03CR) 10Yuvipanda: [C: 032] tools: Add misctools package to all exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/262882 (owner: 10Yuvipanda) [11:58:13] PROBLEM - puppet last run on mw2209 is CRITICAL: CRITICAL: puppet fail [12:00:13] PROBLEM - Host ms-be1013 is DOWN: PING CRITICAL - Packet loss = 100% [12:25:14] RECOVERY - puppet last run on mw2209 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:59:14] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: puppet fail [13:18:52] (03PS1) 10Mdann52: Add http://webapi.aucklandmuseum.com/ to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) [13:28:24] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:42:42] (03PS1) 10Mdann52: Enable Wikilove extenstion on es.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262894 [13:43:21] (03PS1) 10Mdann52: additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 [13:45:40] (03PS2) 10Mdann52: additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) [15:17:28] (03CR) 10Luke081515: [C: 04-1] Add http://webapi.aucklandmuseum.com/ to $wgCopyUploadsDomains (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) (owner: 10Mdann52) [15:20:00] (03CR) 10Luke081515: [C: 04-1] additional import sources for kn.wikisource.org (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) (owner: 10Mdann52) [15:21:04] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1920300 (10Aklapper) p:5High>3Triage [[ https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities | Resetting priority to default ]]. If you think there is some urgency compared to other high priority tasks on t... [15:21:22] 6operations: add user jrabbah@ to strategicpartnerships@ - https://phabricator.wikimedia.org/T122989#1920303 (10Aklapper) p:5Normal>3Triage [15:24:36] (03CR) 10Luke081515: [C: 04-1] "The change should include a comment like "// T119787" to find the task for that change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255810 (https://phabricator.wikimedia.org/T119787) (owner: 10Mdann52) [15:29:25] (03PS9) 10Luke081515: Enable new user groups on gu.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255810 (https://phabricator.wikimedia.org/T119787) (owner: 10Mdann52) [15:30:19] (03CR) 10Luke081515: [C: 031] Enable new user groups on gu.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255810 (https://phabricator.wikimedia.org/T119787) (owner: 10Mdann52) [16:07:33] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1920366 (10eliza) Thank you for the info - Priority was updated to urgent as per user's request. as per SLien: "I will be temporarily managing Wikimedia 15 outreach from Jimmy' Let me know if you need anything else. Eliza [16:15:33] 6operations: add slien to jimmy alias - https://phabricator.wikimedia.org/T122927#1920384 (10Aklapper) [16:29:34] (03PS2) 10Mdann52: Add http://webapi.aucklandmuseum.com/ to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) [16:49:31] (03PS3) 10Mdann52: additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) [16:53:31] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4.2 - https://phabricator.wikimedia.org/T107762#1920466 (10GWicke) Things are looking good so far. As expected, the GC behavior is a bit more incremental in 4.2's V8, but the rate of reaching the heap threshold has n... [17:00:04] Deploy window Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160107T1700) [17:15:46] (03CR) 10Alexandros Kosiaris: [C: 032] RuboCop: fixed Style/CaseIndentation offense [puppet] - 10https://gerrit.wikimedia.org/r/259699 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [17:15:54] (03PS3) 10Alexandros Kosiaris: RuboCop: fixed Style/CaseIndentation offense [puppet] - 10https://gerrit.wikimedia.org/r/259699 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [17:15:56] (03CR) 10Alexandros Kosiaris: [V: 032] RuboCop: fixed Style/CaseIndentation offense [puppet] - 10https://gerrit.wikimedia.org/r/259699 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [17:29:13] (03CR) 10Luke081515: [C: 04-1] Add http://webapi.aucklandmuseum.com/ to $wgCopyUploadsDomains (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262893 (https://phabricator.wikimedia.org/T122995) (owner: 10Mdann52) [17:29:25] (03CR) 10Luke081515: [C: 031] additional import sources for kn.wikisource.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262895 (https://phabricator.wikimedia.org/T122955) (owner: 10Mdann52) [17:43:24] PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [17:45:33] RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0] [17:59:50] (03PS1) 10Faidon Liambotis: mediawiki: really silence HHVM restart cron [puppet] - 10https://gerrit.wikimedia.org/r/262911 [18:00:45] (03CR) 10Faidon Liambotis: [C: 032 V: 032] mediawiki: really silence HHVM restart cron [puppet] - 10https://gerrit.wikimedia.org/r/262911 (owner: 10Faidon Liambotis) [18:23:47] 6operations, 7Pybal: conftool backend errors during merge - https://phabricator.wikimedia.org/T114091#1920589 (10Joe) a:3Joe [18:36:48] (03PS3) 10Faidon Liambotis: interface: lint interface-rps.py [puppet] - 10https://gerrit.wikimedia.org/r/262595 (owner: 10Hashar) [18:39:11] (03CR) 10Faidon Liambotis: [C: 032] interface: lint interface-rps.py [puppet] - 10https://gerrit.wikimedia.org/r/262595 (owner: 10Hashar) [18:45:21] PROBLEM - MariaDB Slave Lag: s1 on db1051 is CRITICAL: CRITICAL slave_sql_lag Seconds_Behind_Master: 362 [18:49:41] RECOVERY - MariaDB Slave Lag: s1 on db1051 is OK: OK slave_sql_lag Seconds_Behind_Master: 0 [18:53:39] <_joe_> !log powercycling ms-be1013 [18:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:54:47] <_joe_> !log also resetting the drac [18:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:56:25] RECOVERY - swift-container-auditor on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:56:25] RECOVERY - swift-object-auditor on ms-be1013 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:56:25] RECOVERY - SSH on ms-be1013 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2wmfprecise2 (protocol 2.0) [18:56:25] RECOVERY - swift-object-updater on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:56:25] RECOVERY - swift-account-auditor on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:56:25] RECOVERY - swift-account-server on ms-be1013 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:56:26] RECOVERY - DPKG on ms-be1013 is OK: All packages OK [18:56:26] RECOVERY - swift-container-replicator on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:56:27] RECOVERY - swift-account-reaper on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:56:27] RECOVERY - swift-container-server on ms-be1013 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:56:34] RECOVERY - Host ms-be1013 is UP: PING OK - Packet loss = 0%, RTA = 1.75 ms [18:57:14] RECOVERY - swift-container-updater on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:57:23] RECOVERY - salt-minion processes on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:57:23] RECOVERY - Check size of conntrack table on ms-be1013 is OK: OK: nf_conntrack is 9 % full [18:57:23] RECOVERY - very high load average likely xfs on ms-be1013 is OK: OK - load average: 26.14, 9.29, 3.33 [18:57:24] RECOVERY - dhclient process on ms-be1013 is OK: PROCS OK: 0 processes with command name dhclient [18:57:24] RECOVERY - swift-account-replicator on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [18:57:24] RECOVERY - swift-object-replicator on ms-be1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:57:24] RECOVERY - swift-object-server on ms-be1013 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:57:25] RECOVERY - RAID on ms-be1013 is OK: OK: optimal, 14 logical, 14 physical [18:57:27] (03PS2) 10Faidon Liambotis: network: move frack networks into a separate realm [puppet] - 10https://gerrit.wikimedia.org/r/260923 [18:57:29] (03PS2) 10Faidon Liambotis: network: add sandbox "realm" [puppet] - 10https://gerrit.wikimedia.org/r/260925 [18:57:31] (03PS2) 10Faidon Liambotis: network: split frack into its proper subnets [puppet] - 10https://gerrit.wikimedia.org/r/260924 [18:57:51] (03CR) 10Faidon Liambotis: network: split frack into its proper subnets (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/260924 (owner: 10Faidon Liambotis) [19:04:02] 6operations, 10ops-eqiad: ms-be1013 drac is not reachable via ssh - https://phabricator.wikimedia.org/T123086#1920652 (10Joe) 3NEW [19:04:15] 6operations, 10ops-eqiad: ms-be1013 drac is not reachable via ssh - https://phabricator.wikimedia.org/T123086#1920659 (10Joe) p:5Triage>3Low [19:11:52] !log run sudo lvremove backup/tools20151216020005 on labstore2001 to clean up full snapshot [19:11:53] !log setting up watchdog process killing long running queries on db1051 [19:11:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:33] !log remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway [19:13:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:13:39] !log remove snapshots others20150815030010, others20150815030010, maps20151216040005 and maps20151028040004 that were all stale and should've been removed anyway (on labstore2001) [19:14:44] RECOVERY - Last backup of the tools filesystem on labstore1001 is OK: OK - Last run for unit replicate-tools was successful [19:19:25] RECOVERY - Last backup of the maps filesystem on labstore1001 is OK: OK - Last run for unit replicate-maps was successful [19:21:02] !log started tools / maps backup on labstore1001 [19:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:49:00] 6operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#1920753 (10Kelson) @Faidon What I can say is that Kiwix mwoffliner instances are hitting the limit and after having to deal with mass storage limits (IO and space) and RAM limit, now we have to... [19:52:24] 6operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#1920771 (10faidon) Thanks, @Kelson, this is helpful feedback. Note that we have temporarily reverted the limits since Dec 28th with 4c07fac36de29eca061cb1d99d5a48464623a8d4. We'll consider this... [20:09:38] 6operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#1920784 (10Kelson) @faidon Great, really happy to hear that this limitations are temporarily off. [20:18:23] (03PS2) 10Dzahn: Add pc200[4-6] MAC address entries Bug:T121879 [puppet] - 10https://gerrit.wikimedia.org/r/262669 (https://phabricator.wikimedia.org/T121879) (owner: 10Papaul) [20:19:07] (03CR) 10Dzahn: [C: 032] Add pc200[4-6] MAC address entries Bug:T121879 [puppet] - 10https://gerrit.wikimedia.org/r/262669 (https://phabricator.wikimedia.org/T121879) (owner: 10Papaul) [20:26:51] 10Ops-Access-Requests, 6operations, 10Analytics, 10ContentTranslation-Analytics, and 2 others: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#1920799 (10Dzahn) [20:32:21] 6operations: onboarding Emanuele Rocca - https://phabricator.wikimedia.org/T123089#1920803 (10Dzahn) 3NEW [20:38:23] 6operations: add slien to jimmy alias - https://phabricator.wikimedia.org/T122927#1920819 (10Dzahn) So this is https://wikimediafoundation.org/wiki/User:SLien_%28WMF%29 it seems. I wonder if we are expected to somehow confirm this before changing access to Jimmy's mail, but i wouldn't know how. [20:39:35] 6operations: add slien to jimmy alias - https://phabricator.wikimedia.org/T122927#1920820 (10Dzahn) Maybe the jimmy@ alias can be moved over to Google and OIT ? @cajoel [20:43:41] 6operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#1920829 (10BBlack) @Kelson - does Kiwix mwoffliner use an authenticated session, or is it anonymous? For future rate-limiting plans, it makes a big difference. [20:47:04] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [20:47:13] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [20:53:23] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:53:24] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:57:33] (03PS1) 10Legoktm: extdist: Unbreak [puppet] - 10https://gerrit.wikimedia.org/r/262921 (https://phabricator.wikimedia.org/T123090) [21:00:24] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Puppet has 1 failures [21:06:17] (03CR) 10Yuvipanda: extdist: Unbreak (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/262921 (https://phabricator.wikimedia.org/T123090) (owner: 10Legoktm) [21:07:32] (03PS2) 10Legoktm: extdist: Unbreak [puppet] - 10https://gerrit.wikimedia.org/r/262921 (https://phabricator.wikimedia.org/T123090) [21:25:24] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:35:21] 6operations, 10ops-eqiad: ms-be1013 drac is not reachable via ssh - https://phabricator.wikimedia.org/T123086#1920972 (10Cmjohnson) Typically, this requires powering down and draining flea power. [21:57:46] 6operations, 10Deployment-Systems, 10Traffic: Varnish cache busting desired for /static/$VERSION/ resources which change within the lifetime of a branch - https://phabricator.wikimedia.org/T99096#1920989 (10Krinkle) I discussed the proposal in T99096#1583708 with @TStarling and @mark at the Dev Summit. The o... [21:58:23] TimStarling: mark: https://phabricator.wikimedia.org/T99096 [22:37:17] (03PS2) 10Giuseppe Lavagetto: Add native ipvs manager [debs/pybal] - 10https://gerrit.wikimedia.org/r/261375 [22:40:28] (03CR) 10jenkins-bot: [V: 04-1] Add native ipvs manager [debs/pybal] - 10https://gerrit.wikimedia.org/r/261375 (owner: 10Giuseppe Lavagetto) [22:41:55] (03CR) 10Alexandros Kosiaris: [C: 032] network: move frack networks into a separate realm [puppet] - 10https://gerrit.wikimedia.org/r/260923 (owner: 10Faidon Liambotis) [22:42:00] (03PS3) 10Alexandros Kosiaris: network: move frack networks into a separate realm [puppet] - 10https://gerrit.wikimedia.org/r/260923 (owner: 10Faidon Liambotis) [22:42:06] (03CR) 10Alexandros Kosiaris: [V: 032] network: move frack networks into a separate realm [puppet] - 10https://gerrit.wikimedia.org/r/260923 (owner: 10Faidon Liambotis) [22:42:27] (03PS1) 10Papaul: Replaced all spacing with tab Bug:T121879 [puppet] - 10https://gerrit.wikimedia.org/r/262998 (https://phabricator.wikimedia.org/T121879) [22:45:14] (03PS1) 10Anomie: Prepare for merged of ApiSandbox into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262999 [22:45:16] (03PS1) 10Anomie: Undeploy ApiSandbox extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263000 [22:57:20] !log depool scb1002 for mobileapps. Transition to nodejs 4.2 ongoing [22:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:58:22] !log disable puppet and salt on scb1001 from nodejs 4.2 transition [22:58:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:01:23] !log apt-mark hold nodejs on scb1001, etherpad1001 and maps-test200{1,2,3,4} [23:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:03:33] PROBLEM - salt-minion processes on scb1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [23:05:33] PROBLEM - DPKG on etherpad1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:06:34] PROBLEM - DPKG on scb1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:06:54] PROBLEM - DPKG on restbase2006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:06:54] PROBLEM - DPKG on restbase2005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:03] PROBLEM - DPKG on restbase1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:15] PROBLEM - DPKG on restbase-test2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:24] PROBLEM - DPKG on restbase1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:34] PROBLEM - DPKG on restbase2003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:34] PROBLEM - DPKG on restbase2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:44] PROBLEM - DPKG on restbase1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:54] PROBLEM - DPKG on restbase1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:07:54] PROBLEM - DPKG on restbase1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:03] PROBLEM - DPKG on restbase-test2003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:04] PROBLEM - DPKG on restbase1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:13] PROBLEM - DPKG on restbase1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:14] PROBLEM - DPKG on restbase2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:23] PROBLEM - DPKG on restbase-test2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:27] did someone just break packaging? [23:08:34] PROBLEM - DPKG on restbase1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:34] PROBLEM - DPKG on restbase1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:08:44] PROBLEM - DPKG on restbase2004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:09:05] akosiaris: ^ these are due to your apt work now? [23:09:26] (i saw your log for some of the affected servers, but not all of them, so just checking) [23:09:53] !log mobileapps deploying 58b371a on scb1002 [23:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:10:08] RobH: yup, that's him, all good [23:10:11] :P [23:10:30] I thought we weren't merging things live during the all staff =P [23:12:10] (03CR) 10Anomie: [C: 04-2] "Don't merge yet, wait until all deployed branches contain Ic42a6c5ef54b811cd63cfef2132942b27a626fe5." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263000 (owner: 10Anomie) [23:13:37] If google authenticator lost my 2 factor authentication, is there anyway I can reset my token to access wikitech.wikimedia.org? [23:16:33] Thehelpfulone, do you have your recovery codes? [23:16:58] umm probably not :p [23:17:08] where would they be? email? [23:17:11] no [23:20:32] I can theoretically remove users' 2FA requirements on wikitech, let me know later if you still can't get in and I'll see what we need to check [23:20:48] (03PS2) 10Anomie: Prepare for merge of ApiSandbox into core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/262999 [23:21:04] (03PS2) 10Anomie: Undeploy ApiSandbox extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/263000 [23:21:49] RobH: yes they are [23:21:53] I 'll fix that [23:22:45] Krenair: thanks, yeah still can't seem to login, pretty sure my password is right [23:23:13] !log mobileapps deploying 58b371a on scb1001 [23:23:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:54] PROBLEM - puppet last run on restbase1006 is CRITICAL: CRITICAL: Puppet has 1 failures [23:24:33] RECOVERY - salt-minion processes on scb1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [23:24:47] !log enabled puppet,salt on scb1001 [23:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:55] !log repooled scb1002 for mobileapps [23:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:25:13] PROBLEM - puppet last run on restbase1005 is CRITICAL: CRITICAL: Puppet has 1 failures [23:25:34] PROBLEM - puppet last run on restbase1002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:27:43] RECOVERY - DPKG on scb1001 is OK: All packages OK [23:29:23] PROBLEM - puppet last run on restbase1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:29:34] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:30:35] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [23:31:23] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures [23:32:45] PROBLEM - puppet last run on restbase1008 is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:23] PROBLEM - puppet last run on restbase2003 is CRITICAL: CRITICAL: Puppet has 1 failures [23:35:34] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:14] PROBLEM - puppet last run on restbase-test2001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:38:55] Thehelpfulone, can you ssh to a labs host? [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase-test2001 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase-test2003 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase1001 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase1002 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase1005 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase1006 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:56] ACKNOWLEDGEMENT - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:57] ACKNOWLEDGEMENT - puppet last run on restbase1008 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:57] ACKNOWLEDGEMENT - puppet last run on restbase2001 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:58] ACKNOWLEDGEMENT - puppet last run on restbase2003 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:58] ACKNOWLEDGEMENT - puppet last run on restbase2004 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:38:59] ACKNOWLEDGEMENT - puppet last run on restbase2006 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for migration to nodejs 4.2 [23:40:23] ACKNOWLEDGEMENT - DPKG on restbase-test2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:23] ACKNOWLEDGEMENT - DPKG on restbase-test2002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:23] ACKNOWLEDGEMENT - DPKG on restbase-test2003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:23] ACKNOWLEDGEMENT - DPKG on restbase1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:23] ACKNOWLEDGEMENT - DPKG on restbase1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:24] ACKNOWLEDGEMENT - DPKG on restbase1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:24] ACKNOWLEDGEMENT - puppet last run on restbase1003 is CRITICAL: CRITICAL: Puppet has 1 failures alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:25] ACKNOWLEDGEMENT - DPKG on restbase1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:25] ACKNOWLEDGEMENT - DPKG on restbase1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:26] ACKNOWLEDGEMENT - DPKG on restbase1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:26] ACKNOWLEDGEMENT - DPKG on restbase1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:27] ACKNOWLEDGEMENT - DPKG on restbase1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:27] ACKNOWLEDGEMENT - DPKG on restbase1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:28] ACKNOWLEDGEMENT - DPKG on restbase2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages alexandros kosiaris apt-mark hold nodejs for nodejs 4.2 transition [23:40:54] maybe, haven't set up labs on this machine though [23:41:16] let's go to PM [23:42:14] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [23:44:03] (03PS1) 10Gergő Tisza: Add Sentry hiera rules to deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/263012 (https://phabricator.wikimedia.org/T85239) [23:44:04] PROBLEM - puppet last run on restbase1004 is CRITICAL: CRITICAL: Puppet has 1 failures [23:46:25] PROBLEM - puppet last run on restbase2005 is CRITICAL: CRITICAL: Puppet has 1 failures [23:48:15] PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:49:24] (03PS2) 10Papaul: Replaced all spacing with tab Bug:T121879 [puppet] - 10https://gerrit.wikimedia.org/r/262998 (https://phabricator.wikimedia.org/T121879) [23:50:04] PROBLEM - puppet last run on restbase2002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:52:34] PROBLEM - salt-minion processes on sca1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [23:52:53] PROBLEM - salt-minion processes on sca1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [23:54:30] (03PS1) 10Alexandros Kosiaris: Apply the mathoid role on scb [puppet] - 10https://gerrit.wikimedia.org/r/263015 [23:56:09] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Apply the mathoid role on scb [puppet] - 10https://gerrit.wikimedia.org/r/263015 (owner: 10Alexandros Kosiaris) [23:56:23] (03PS3) 10Papaul: Replaced all spacing with tab Bug:T121879 [puppet] - 10https://gerrit.wikimedia.org/r/262998 (https://phabricator.wikimedia.org/T121879)