[00:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T0000). Please do the needful. [00:00:04] tgr: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:23] o/ [00:04:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [00:05:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [00:06:40] (03PS1) 10Rush: nfsclient: force link creation even dir exists in this case [puppet] - 10https://gerrit.wikimedia.org/r/321572 [00:06:52] tgr, I guess I cn deploy [00:07:42] (03PS2) 10MaxSem: Log Throttler events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321118 (owner: 10Gergő Tisza) [00:07:52] thanks MaxSem [00:07:54] (03CR) 10Rush: [C: 032] nfsclient: force link creation even dir exists in this case [puppet] - 10https://gerrit.wikimedia.org/r/321572 (owner: 10Rush) [00:08:25] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:08:27] (03CR) 10MaxSem: [C: 032] Log Throttler events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321118 (owner: 10Gergő Tisza) [00:08:44] office IPs are unthrottlable, right? [00:08:56] (03Merged) 10jenkins-bot: Log Throttler events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321118 (owner: 10Gergő Tisza) [00:14:29] tgr, pulled on mw1099 in case you can test it [00:15:20] (03CR) 10Filippo Giunchedi: "Sadly I can't test this with puppet compiler due to T150456" [puppet] - 10https://gerrit.wikimedia.org/r/321568 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi) [00:16:08] (03CR) 10Filippo Giunchedi: "Ditto as I4899f4a84, can't be tested with puppet compiler due to T150456" [puppet] - 10https://gerrit.wikimedia.org/r/321567 (owner: 10Filippo Giunchedi) [00:18:16] (03PS1) 10Madhuvishy: Revert "nfs_mount: do mount{} absent prior to nfs-mount-manager" [puppet] - 10https://gerrit.wikimedia.org/r/321573 [00:19:34] (03CR) 10Madhuvishy: [C: 032] Revert "nfs_mount: do mount{} absent prior to nfs-mount-manager" [puppet] - 10https://gerrit.wikimedia.org/r/321573 (owner: 10Madhuvishy) [00:20:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [00:24:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [00:24:38] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/321118/ (duration: 00m 47s) [00:24:49] tgr, ^ [00:24:57] thx! [00:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:51:20] !log bromine - apt-get clean for disk space [00:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:32] !log bromine - deleted some un-packed mediawiki release versions from home/csteipp/releasetools for disk space (back tp 80%) [00:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:57:51] (03PS1) 10Filippo Giunchedi: graphite: avoid spikes in mw error rate alert [puppet] - 10https://gerrit.wikimedia.org/r/321577 [01:01:38] !log deploying ghostscript regression update on jobrunner-eqiad [01:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:05:29] !log deploying ghostscript regression update on API appservers [01:06:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:06:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [01:07:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [01:07:17] godog: ^ is that about avoiding spikes ? [01:07:29] i mean...what you just uploaded [01:07:39] yeah [01:07:49] ah :) [01:08:39] !log deploying ghostscript regression update on Imagescalers [01:09:06] but yeah most are at the top of the hour, when the spike happens [01:09:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:09:57] gotcha [01:11:35] PROBLEM - puppet last run on mw2149 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[ghostscript] [01:11:59] that will probably be just that dpkg was busy during run [01:12:27] !log deploying ghostscript regression update on Videoscalers (and manually on osmium) [01:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:22:00] (03PS1) 10Yuvipanda: tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 [01:23:01] (03CR) 10Andrew Bogott: [C: 031] tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 (owner: 10Yuvipanda) [01:23:03] (03CR) 10jenkins-bot: [V: 04-1] tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 (owner: 10Yuvipanda) [01:25:51] (03PS2) 10Yuvipanda: tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 [01:26:54] (03CR) 10jenkins-bot: [V: 04-1] tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 (owner: 10Yuvipanda) [01:27:31] (03PS3) 10Yuvipanda: tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 [01:29:31] (03CR) 10Andrew Bogott: [C: 031] tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 (owner: 10Yuvipanda) [01:32:36] (03PS4) 10Yuvipanda: tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 [01:34:55] PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 1810.82684 Seconds [01:34:55] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 1811.902495 Seconds [01:35:55] RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 32.227299 Seconds [01:35:55] RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 33.296004 Seconds [01:38:35] RECOVERY - puppet last run on mw2149 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [01:40:02] (03PS1) 10Dzahn: fix reverse names of ulsfo mgmt interfaces [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) [01:43:57] (03CR) 10Dzahn: "[radon:~] $ for mgmt in 1 2 3 6; do host 10.128.128.$mgmt; done" [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) (owner: 10Dzahn) [01:50:38] (03PS2) 10Dzahn: fix reverse names of ulsfo mgmt interfaces [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) [01:51:24] (03PS3) 10Dzahn: consistent capitalization of mgmt asset tag names [dns] - 10https://gerrit.wikimedia.org/r/320959 [01:52:58] (03CR) 10Dzahn: [C: 032] consistent capitalization of mgmt asset tag names [dns] - 10https://gerrit.wikimedia.org/r/320959 (owner: 10Dzahn) [01:53:40] (03PS3) 10Dzahn: fix reverse names of ulsfo mgmt interfaces [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) [01:59:25] (03CR) 10Yuvipanda: [C: 032] tools: Have bind mount depend on the project mount [puppet] - 10https://gerrit.wikimedia.org/r/321579 (owner: 10Yuvipanda) [02:01:09] what the hell is up with labs [02:01:13] i cannot even ssh in [02:01:52] (03PS1) 10Yuvipanda: labs: Fix possible ordering problem with nfs_client [puppet] - 10https://gerrit.wikimedia.org/r/321583 [02:02:05] Zppix, labs is working except for NFS [02:02:07] so tools is down [02:02:12] ffs [02:02:16] but beta etc. is working quite happily [02:02:17] why [02:02:42] i need a way to access a tool til its back up is there any way possible? [02:03:03] https://lists.wikimedia.org/pipermail/labs-l/2016-November/004768.html [02:03:31] (03PS1) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [02:04:01] Krenair is there anyway i can get access to it [02:04:33] dunno [02:04:39] tools isn't really my thing [02:04:53] Krenair crap... [02:05:01] why, is it urgent? [02:06:34] Krenair well we're stuck with helpmebot which doesnt have a good ip lookup api and piaget2 which is what we normally use for -helpers is down prob because of the labs maintence so i was going to boot up my bot for a backup [02:08:30] that doesn't sound very urgent to me [02:09:10] Krenair we love our ip lookup (hence i personally think its urgent) [02:09:36] what is it used for? [02:10:09] to detect trolls and to possible proxy stuff etc etc [02:10:15] (in -help obvs) [02:11:00] (03PS2) 10Yuvipanda: labs: Require base classes than include them [puppet] - 10https://gerrit.wikimedia.org/r/321583 [02:12:08] (03CR) 10jenkins-bot: [V: 04-1] labs: Require base classes than include them [puppet] - 10https://gerrit.wikimedia.org/r/321583 (owner: 10Yuvipanda) [02:13:05] (03PS3) 10Yuvipanda: labs: Require base classes than include them [puppet] - 10https://gerrit.wikimedia.org/r/321583 [02:18:41] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.2) (duration: 06m 13s) [02:19:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:23:11] (03CR) 10Dzahn: base/ipmi: install freeipmi globally, move to ipmi module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) [02:26:22] (03PS7) 10Dzahn: tcpircbot: improve firewall rule setup [puppet] - 10https://gerrit.wikimedia.org/r/316497 [02:36:36] (03PS1) 10Yuvipanda: toollabs: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321588 [02:37:42] (03CR) 10Madhuvishy: [C: 031] toollabs: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321588 (owner: 10Yuvipanda) [02:38:25] (03CR) 10Yuvipanda: [C: 032] labs: Require base classes than include them [puppet] - 10https://gerrit.wikimedia.org/r/321583 (owner: 10Yuvipanda) [02:38:42] (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321588 (owner: 10Yuvipanda) [02:39:05] (03PS2) 10Dzahn: Add prod DNS for restbase201[0-2] Bug:T150680 [dns] - 10https://gerrit.wikimedia.org/r/321546 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [02:43:29] (03PS2) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [02:50:50] (03PS1) 10Yuvipanda: toollabs: Actually not mount bind mount on boot [puppet] - 10https://gerrit.wikimedia.org/r/321591 [02:52:20] (03CR) 10Yuvipanda: [C: 032] toollabs: Actually not mount bind mount on boot [puppet] - 10https://gerrit.wikimedia.org/r/321591 (owner: 10Yuvipanda) [03:05:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [03:07:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [03:14:13] (03PS3) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [03:14:15] (03PS1) 10Andrew Bogott: Toollabs: Fix mount dependency order [puppet] - 10https://gerrit.wikimedia.org/r/321594 [03:24:05] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 663.97 seconds [03:25:37] (03CR) 10jenkins-bot: [V: 04-1] Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 (owner: 10Andrew Bogott) [03:25:43] (03CR) 10jenkins-bot: [V: 04-1] Toollabs: Fix mount dependency order [puppet] - 10https://gerrit.wikimedia.org/r/321594 (owner: 10Andrew Bogott) [03:27:49] (03PS2) 10Andrew Bogott: Toollabs: Fix mount dependency order [puppet] - 10https://gerrit.wikimedia.org/r/321594 [03:27:51] (03PS4) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [03:33:28] (03PS1) 10Chad: query.php: Serve HTTP 410 Gone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321595 [03:41:55] PROBLEM - puppet last run on labstore1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:45:29] (03CR) 10Andrew Bogott: [C: 032] Toollabs: Fix mount dependency order [puppet] - 10https://gerrit.wikimedia.org/r/321594 (owner: 10Andrew Bogott) [03:46:06] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 226.03 seconds [03:47:55] RECOVERY - puppet last run on labstore1005 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [03:50:22] (03PS1) 10Chad: MWVersion: Use the version directly from multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321598 [03:51:22] (03PS1) 10Chad: Move 2 web entry points to w/ where they belong [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321599 [03:51:24] (03PS1) 10Chad: MWVersion.php simplification [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321600 [03:54:14] (03CR) 10Dzahn: [C: 032] Add prod DNS for restbase201[0-2] Bug:T150680 [dns] - 10https://gerrit.wikimedia.org/r/321546 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [03:57:59] (03PS1) 10Chad: Remove $wmgUsabilityPrefSwitch, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321601 [03:59:42] (03PS3) 10Dzahn: Add mgmt DNS entries for restbase201[0-2] [dns] - 10https://gerrit.wikimedia.org/r/321472 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [04:00:09] wow, usability initiave [04:00:16] (03CR) 10Dzahn: [C: 032] Add mgmt DNS entries for restbase201[0-2] [dns] - 10https://gerrit.wikimedia.org/r/321472 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [04:00:56] (03PS8) 10Dzahn: tcpircbot: improve firewall rule setup [puppet] - 10https://gerrit.wikimedia.org/r/316497 [04:04:07] (03CR) 10Dzahn: [C: 032] tcpircbot: improve firewall rule setup [puppet] - 10https://gerrit.wikimedia.org/r/316497 (owner: 10Dzahn) [04:08:37] bawolff: #til usabilitywiki has a custom docroot [04:08:52] lol [04:09:27] What a disgusting apache config too [04:09:27] your edit to query.php made me nostolgic for query.php and the horrible hack to get around same origin policy pre the existence of CORS [04:12:01] (03CR) 10Dzahn: "confirmed on einsteinium and tegmen. big diff in the ferm config, no diff in resulting iptables rules (on einsteinium), on tegmen it resul" [puppet] - 10https://gerrit.wikimedia.org/r/316497 (owner: 10Dzahn) [04:19:25] gerrit 316497 was merged and i can still log [04:20:37] \!log gerrit 316497 was merged and i can still log, yours terbium [04:24:08] !log gerrit 316497 was merged and logging from puppetmaster1001 over to icinga still works [04:24:39] !log !log (test logging) [04:24:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:25:05] ok, good.. and ... out [04:25:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:26:44] (03CR) 10Dzahn: "04:24 mutante: !log (test logging)" [puppet] - 10https://gerrit.wikimedia.org/r/316497 (owner: 10Dzahn) [04:27:04] (03PS5) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [04:49:05] PROBLEM - puppet last run on prometheus2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:06:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [05:07:15] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 385 bytes in 0.002 second response time [05:08:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [05:16:15] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.037 second response time [05:16:35] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:18:05] RECOVERY - puppet last run on prometheus2002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [05:44:35] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:06:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [06:09:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [06:10:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [07:05:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [07:07:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [07:30:35] PROBLEM - puppet last run on cp3032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:42:05] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [07:43:05] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3211264 keys, up 14 days 23 hours - replication_delay is 0 [07:47:31] !log Deploy schema change s4 commonswiki.revision db1064 - T147305 [07:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:14] T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305 [07:48:43] can someone investigate stashbot lags [07:53:01] !log install remaining curl security updates in eqiad and codfw [07:53:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:35] RECOVERY - puppet last run on cp3032 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [08:00:02] (03CR) 10Marostegui: [C: 031] Repool db2042 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321476 (https://phabricator.wikimedia.org/T150334) (owner: 10Jcrespo) [08:04:02] (03CR) 10Marostegui: "Just to be picky - I would add a comment with the link to the MariaDB documentation: https://mariadb.com/kb/en/mariadb/unix_socket-authent" [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [08:05:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [08:08:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [08:27:48] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321611 [08:31:56] (03PS2) 10Giuseppe Lavagetto: Allow building upon wikimedia-jessie [calico-containers] - 10https://gerrit.wikimedia.org/r/321388 [08:32:13] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Allow building upon wikimedia-jessie [calico-containers] - 10https://gerrit.wikimedia.org/r/321388 (owner: 10Giuseppe Lavagetto) [08:32:51] arseny92: you will need to talk to bd808 about stashbot [08:35:03] arseny92: hi, if there is a problem with stashbot please fill a task ! [08:35:09] that will cc appropriate folks [08:40:43] (03PS1) 10Giuseppe Lavagetto: Add files to build debian binary package [calico-containers] - 10https://gerrit.wikimedia.org/r/321613 [08:41:44] (03Abandoned) 10Giuseppe Lavagetto: Add files to build debian binary package [calico-containers] - 10https://gerrit.wikimedia.org/r/321389 (owner: 10Giuseppe Lavagetto) [08:41:55] (03PS2) 10Giuseppe Lavagetto: Add files to build debian binary package [calico-containers] - 10https://gerrit.wikimedia.org/r/321613 [08:42:05] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Add files to build debian binary package [calico-containers] - 10https://gerrit.wikimedia.org/r/321613 (owner: 10Giuseppe Lavagetto) [08:43:30] (03PS1) 10Giuseppe Lavagetto: Allow building upon wikimedia-jessie (cherry picked from commit d34224be54cf5fc274fea426925af774fb7af8d0) [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321614 [08:43:48] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Allow building upon wikimedia-jessie (cherry picked from commit d34224be54cf5fc274fea426925af774fb7af8d0) [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321614 (owner: 10Giuseppe Lavagetto) [08:44:29] (03PS1) 10Giuseppe Lavagetto: Add files to build debian binary package [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321615 [08:44:36] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Add files to build debian binary package [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321615 (owner: 10Giuseppe Lavagetto) [08:53:17] _joe_: hi. Wanna get Jenkins jobs to build a .deb packge for calico-containers , [08:53:18] ? [08:53:23] quite trivial to setup in CI [08:53:42] (assuming gbp works out of the box it should pass) [09:02:49] <_joe_> hashar: no it does not, it's a shameful hack that involves docker and profanities (multiple) [09:03:01] <_joe_> and binary deb packages, yuck [09:03:17] so the build is barely reproducible is it ? [09:03:27] <_joe_> define reproducible [09:03:47] git clone && gbp buidpackage ==> some.deb I can install :] [09:03:51] <_joe_> it is reproducible on any machine with the role::calico::builder applied, in theory [09:04:06] so that is not so shameful! [09:04:40] I will skip/ignore adding a jenkins job there [09:04:41] <_joe_> hashar: reproducible builds are https://wiki.debian.org/ReproducibleBuilds [09:04:51] oh yeah sorry for the confusion [09:05:05] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 41 probes of 411 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [09:05:56] <_joe_> hashar: in the not too distant future, we could create a special job going to special CI slaves [09:06:21] <_joe_> but not today, I'll add that to the "Kubernetes todo list" :P [09:10:05] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 411 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [09:16:19] (03CR) 10Ema: [C: 031] "Nice, we can also use it for ATS when the time comes :P. LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/321567 (owner: 10Filippo Giunchedi) [09:32:13] (03PS3) 10Giuseppe Lavagetto: calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 [09:32:15] PROBLEM - NTP on dataset1001 is CRITICAL: NTP CRITICAL: Offset unknown [09:33:12] that's after the reboot, I restarted ntp ^^ [09:33:49] should recover soonish [09:34:43] (03CR) 10Alexandros Kosiaris: "So while moving the package list in hiera itself is fine, this contradicts point #2 of https://phabricator.wikimedia.org/T147718 where it " [puppet] - 10https://gerrit.wikimedia.org/r/321495 (owner: 10Yuvipanda) [09:35:09] (03CR) 10Giuseppe Lavagetto: [C: 031] "It's a good idea, and the code seems correct; minor suggestion on a term in comments" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321567 (owner: 10Filippo Giunchedi) [09:36:47] (03CR) 10Giuseppe Lavagetto: [C: 031] "@akosiaris it will be enough to make the call to this class explicit where it gets included, this is not in contrast with T147718 directly" [puppet] - 10https://gerrit.wikimedia.org/r/321495 (owner: 10Yuvipanda) [09:38:52] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321611 [09:40:07] (03CR) 10Giuseppe Lavagetto: "@akosiaris this is just included in our "base" class that is, by all means, practically a profile (the "base" profile). So we might fix th" [puppet] - 10https://gerrit.wikimedia.org/r/321495 (owner: 10Yuvipanda) [09:40:36] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321611 (owner: 10Marostegui) [09:41:15] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1064 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321611 (owner: 10Marostegui) [09:42:15] RECOVERY - NTP on dataset1001 is OK: NTP OK: Offset -3.668665886e-05 secs [09:43:34] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1064 - T149079 (duration: 00m 51s) [09:44:10] (03CR) 10Giuseppe Lavagetto: [C: 032] calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 (owner: 10Giuseppe Lavagetto) [09:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:19] T149079: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079 [09:48:56] (03CR) 10Alexandros Kosiaris: "Yes, assuming an explicit invocation of the class it should conform even in the "stricter" version of the RFC where we don't allow implici" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321495 (owner: 10Yuvipanda) [09:49:49] (03PS2) 10Jcrespo: Enable unix socket authentication everywhere [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) [09:49:51] (03PS8) 10Jcrespo: mariadb-labs: Prepare db1095 to be the new sanitarium host [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) [09:49:53] (03PS1) 10Jcrespo: toolsdb: Ignore events from s51071__templatetiger_p, s52721__pagecount_stats_p [puppet] - 10https://gerrit.wikimedia.org/r/321618 (https://phabricator.wikimedia.org/T150553) [09:50:07] (03PS2) 10Jcrespo: toolsdb: Ignore events from s51071__templatetiger_p, s52721__pagecount_stats_p [puppet] - 10https://gerrit.wikimedia.org/r/321618 (https://phabricator.wikimedia.org/T150553) [09:50:57] (03PS3) 10Jcrespo: toolsdb: Ignore events from s51071__templatetiger_p, s52721__pagecount_stats_p [puppet] - 10https://gerrit.wikimedia.org/r/321618 (https://phabricator.wikimedia.org/T150553) [09:52:34] (03CR) 10Jcrespo: [C: 032] toolsdb: Ignore events from s51071__templatetiger_p, s52721__pagecount_stats_p [puppet] - 10https://gerrit.wikimedia.org/r/321618 (https://phabricator.wikimedia.org/T150553) (owner: 10Jcrespo) [09:54:37] (03CR) 10Marostegui: "Did you add the comment? I am not able to see it in the second patch you pushed." [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [10:00:21] (03CR) 10Jcrespo: "This is because a dependent patch 370660039ad32, I did not willingly touch it yet." [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [10:01:16] (03CR) 10Marostegui: "Ah thanks - just wanted to make sure I wasn't missing something :)" [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [10:06:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [10:07:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:10:37] (03PS3) 10Addshore: Enable RevisionSlider (non BetaFeature) on de,ar,hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319541 (https://phabricator.wikimedia.org/T148646) [10:11:04] (03CR) 10Addshore: [C: 04-2] "Squashed into https://gerrit.wikimedia.org/r/#/c/319541 to be deployed on 22nd November" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321103 (https://phabricator.wikimedia.org/T150573) (owner: 10Arseny1992) [10:12:11] (03CR) 10Marostegui: mariadb-labs: Prepare db1095 to be the new sanitarium host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) (owner: 10Jcrespo) [10:14:33] (03PS1) 10Giuseppe Lavagetto: role::calico::builder: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321621 [10:21:16] (03CR) 10Giuseppe Lavagetto: [C: 032] role::calico::builder: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321621 (owner: 10Giuseppe Lavagetto) [10:21:24] (03PS2) 10Giuseppe Lavagetto: role::calico::builder: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321621 [10:21:32] (03CR) 10Giuseppe Lavagetto: [V: 032] role::calico::builder: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/321621 (owner: 10Giuseppe Lavagetto) [10:25:15] (03PS1) 10Giuseppe Lavagetto: profile::calico::builder: s/hieradata/hiera/ [puppet] - 10https://gerrit.wikimedia.org/r/321624 [10:30:57] <_joe_> jeez jenkins [10:31:04] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] profile::calico::builder: s/hieradata/hiera/ [puppet] - 10https://gerrit.wikimedia.org/r/321624 (owner: 10Giuseppe Lavagetto) [10:42:49] (03CR) 10Jcrespo: "Why the URL, specialy if you already know about it? I would prefer it being on our documentation, rather than on an external url. My contr" [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [10:43:56] (03PS1) 10Muehlenhoff: Use different host as canary host for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/321627 [10:44:43] (03CR) 10Marostegui: "Yes, that also fine by me. I suggested the already MariaDB existing documentation because it is already done. But if we document it so we " [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [10:48:55] PROBLEM - DPKG on mw2229 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:25] PROBLEM - DPKG on mw2216 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2221 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2230 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2225 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2222 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2231 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:35] PROBLEM - DPKG on mw2218 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:36] PROBLEM - DPKG on mw2217 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:37] PROBLEM - DPKG on mw2215 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:37] PROBLEM - DPKG on mw2226 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:38] PROBLEM - DPKG on mw2232 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:38] PROBLEM - DPKG on mw2228 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:39] PROBLEM - DPKG on mw2219 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:39] PROBLEM - DPKG on mw2224 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:45] PROBLEM - DPKG on mw2220 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:45] PROBLEM - DPKG on mw2227 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:49:45] PROBLEM - DPKG on mw2223 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:50:55] RECOVERY - DPKG on mw2229 is OK: All packages OK [10:51:15] moritzm: is it you? ^^^ [10:51:18] ^ignore these, fixing those up [10:51:23] ack :D [10:51:35] PROBLEM - puppet last run on mw2223 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [10:51:45] RECOVERY - DPKG on mw2220 is OK: All packages OK [10:52:35] RECOVERY - DPKG on mw2215 is OK: All packages OK [10:53:05] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [10:54:25] RECOVERY - DPKG on mw2216 is OK: All packages OK [10:54:35] PROBLEM - puppet last run on mw2219 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [10:55:35] RECOVERY - DPKG on mw2221 is OK: All packages OK [10:55:35] RECOVERY - DPKG on mw2217 is OK: All packages OK [10:55:35] RECOVERY - DPKG on mw2218 is OK: All packages OK [10:55:35] RECOVERY - DPKG on mw2219 is OK: All packages OK [10:56:15] (03PS9) 10Jcrespo: mariadb-labs: Prepare db1095 to be the new sanitarium host [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) [10:56:35] RECOVERY - DPKG on mw2225 is OK: All packages OK [10:56:35] RECOVERY - DPKG on mw2222 is OK: All packages OK [10:56:35] RECOVERY - DPKG on mw2224 is OK: All packages OK [10:56:45] RECOVERY - DPKG on mw2223 is OK: All packages OK [10:57:05] PROBLEM - puppet last run on mw2217 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [10:57:35] RECOVERY - DPKG on mw2230 is OK: All packages OK [10:57:35] RECOVERY - DPKG on mw2226 is OK: All packages OK [10:57:35] RECOVERY - DPKG on mw2228 is OK: All packages OK [10:57:45] RECOVERY - DPKG on mw2227 is OK: All packages OK [10:58:35] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [10:59:35] RECOVERY - DPKG on mw2231 is OK: All packages OK [10:59:35] RECOVERY - DPKG on mw2232 is OK: All packages OK [11:00:29] (03CR) 10Muehlenhoff: [C: 032] Use different host as canary host for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/321627 (owner: 10Muehlenhoff) [11:00:33] (03PS2) 10Muehlenhoff: Use different host as canary host for debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/321627 [11:01:35] PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [11:05:35] RECOVERY - puppet last run on mw2232 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:18:35] RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:21:05] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:22:35] RECOVERY - puppet last run on mw2219 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:23:55] PROBLEM - DPKG on mw1263 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:23:55] PROBLEM - DPKG on mw1262 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:23:55] RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:24:15] PROBLEM - DPKG on mw1261 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:24:15] PROBLEM - puppet last run on elastic1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:24:18] (03CR) 10Ema: [C: 032] 4.1.3-1wm4: gethdr_extrachance [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/321384 (owner: 10Ema) [11:24:25] PROBLEM - DPKG on mw1264 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:24:55] PROBLEM - DPKG on mw1265 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:25:55] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [11:26:01] (03PS1) 10Volans: wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) [11:26:35] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [11:30:15] RECOVERY - DPKG on mw1261 is OK: All packages OK [11:30:25] RECOVERY - DPKG on mw1264 is OK: All packages OK [11:30:55] RECOVERY - DPKG on mw1265 is OK: All packages OK [11:30:55] RECOVERY - DPKG on mw1263 is OK: All packages OK [11:30:55] RECOVERY - DPKG on mw1262 is OK: All packages OK [11:31:55] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php-pear] [11:32:18] (03PS1) 10Giuseppe Lavagetto: profile::calico::builder: fix dependencies [puppet] - 10https://gerrit.wikimedia.org/r/321634 [11:32:45] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[molly-guard] [11:38:15] PROBLEM - DPKG on mw1268 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:38:15] PROBLEM - DPKG on mw1270 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:38:15] PROBLEM - DPKG on mw1267 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:38:15] PROBLEM - DPKG on mw1269 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:38:35] PROBLEM - DPKG on mw1271 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:39:15] RECOVERY - DPKG on mw1268 is OK: All packages OK [11:39:15] RECOVERY - DPKG on mw1270 is OK: All packages OK [11:39:15] RECOVERY - DPKG on mw1267 is OK: All packages OK [11:39:15] RECOVERY - DPKG on mw1269 is OK: All packages OK [11:39:35] RECOVERY - DPKG on mw1271 is OK: All packages OK [11:41:03] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::calico::builder: fix dependencies [puppet] - 10https://gerrit.wikimedia.org/r/321634 (owner: 10Giuseppe Lavagetto) [11:50:06] (03CR) 10Marostegui: [C: 031] wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) (owner: 10Volans) [11:52:15] RECOVERY - puppet last run on elastic1023 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:53:55] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [11:59:55] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:00:45] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [12:01:52] !log Deploy schema change labsdb1003 s4 commonswiki revision table T147305 [12:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:43] T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305 [12:08:36] (03CR) 10Giuseppe Lavagetto: [C: 031] wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) (owner: 10Volans) [12:17:55] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:18:05] (03PS1) 10Giuseppe Lavagetto: profile::calico::builder: fix file url, template [puppet] - 10https://gerrit.wikimedia.org/r/321637 [12:21:02] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::calico::builder: fix file url, template [puppet] - 10https://gerrit.wikimedia.org/r/321637 (owner: 10Giuseppe Lavagetto) [12:22:55] (03PS1) 10Marostegui: misc.my.cnf.erb: Enable barracuda for all the misc shards [puppet] - 10https://gerrit.wikimedia.org/r/321638 [12:29:02] (03CR) 10Jcrespo: [C: 031] "Actually, this was hold off because labsdb1004 -> labsdb1005 breaks due to differences on large_prefix. So it has to be applied on master " [puppet] - 10https://gerrit.wikimedia.org/r/321638 (owner: 10Marostegui) [12:30:18] !log installing libgd security updates [12:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:24] (03CR) 10Marostegui: "Thanks for the explanation, I will not deploy today, but it is good to know that there is not a big blocker on going full on with misc ser" [puppet] - 10https://gerrit.wikimedia.org/r/321638 (owner: 10Marostegui) [12:39:55] !log installing pillow security updates on eqiad app servers [12:40:35] (03CR) 10Marostegui: mariadb-labs: Prepare db1095 to be the new sanitarium host (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) (owner: 10Jcrespo) [12:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:55] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [12:57:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update sopm description [software/otrs] - 10https://gerrit.wikimedia.org/r/274442 (owner: 10Alexandros Kosiaris) [12:58:29] (03CR) 10Alexandros Kosiaris: [C: 032] icinga: Specify mode for nagios_host, nagios_service [puppet] - 10https://gerrit.wikimedia.org/r/317791 (owner: 10Alexandros Kosiaris) [12:58:34] (03PS10) 10Alexandros Kosiaris: icinga: Specify mode for nagios_host, nagios_service [puppet] - 10https://gerrit.wikimedia.org/r/317791 [12:58:37] (03CR) 10Alexandros Kosiaris: [V: 032] icinga: Specify mode for nagios_host, nagios_service [puppet] - 10https://gerrit.wikimedia.org/r/317791 (owner: 10Alexandros Kosiaris) [12:59:26] (03PS1) 10Volans: icinga: raid_handler improvements [puppet] - 10https://gerrit.wikimedia.org/r/321642 (https://phabricator.wikimedia.org/T149913) [13:06:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [13:10:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:13:22] !log restarting app server canaries to pick up libgd security update [13:13:42] (03PS1) 10Giuseppe Lavagetto: profile::calico::builder: clone to /srv/calico-containers [puppet] - 10https://gerrit.wikimedia.org/r/321648 [13:14:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:17] (03PS1) 10Hashar: contint: remove Apache 2.2 compatibility config [puppet] - 10https://gerrit.wikimedia.org/r/321650 (https://phabricator.wikimedia.org/T150727) [13:17:32] (03PS2) 10Giuseppe Lavagetto: profile::calico::builder: clone to /srv/calico-containers [puppet] - 10https://gerrit.wikimedia.org/r/321648 [13:19:36] (03Abandoned) 10BBlack: Test another write buffer size theory for extra RTT [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320939 (owner: 10BBlack) [13:19:41] (03Abandoned) 10BBlack: nginx (1.11.4-1+wmf15) jessie-wikimedia; urgency=medium [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320940 (owner: 10BBlack) [13:19:58] (03PS1) 10Hashar: contint: allow .htaccess on doc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/321651 (https://phabricator.wikimedia.org/T150727) [13:21:48] !log installing dbus security updates on trusty systems [13:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:39] jouncebot ... [13:22:42] stupid bots [13:26:11] error: can't open output file "/data/project/jouncebot/jouncebot. [13:26:11] bah [13:26:13] (03PS3) 10BBlack: VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 [13:27:01] !log Attempting to restart jouncebot [13:27:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:04] hashar: due to labs maintainece thats why [13:28:47] then it refuses to come back [13:28:52] queue instance "task@tools-exec-1410.eqiad.wmflabs" dropped because it is temporarily not available [13:28:56] queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1206.eqiad.wmflabs" dropped because it is temporarily not available [13:28:57] bah [13:29:31] jouncebot: status [13:29:43] jouncebot: next [13:29:43] In 0 hour(s) and 30 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1400) [13:29:46] jouncebot: update [13:29:52] jouncebot: refresh [13:29:54] I refreshed my knowledge about deployments. [13:30:10] !log Jouncebot is back [13:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:55] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [13:37:55] PROBLEM - puppet last run on mc1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:43:20] (03PS1) 10Muehlenhoff: Update to 4.4.32 [debs/linux44] - 10https://gerrit.wikimedia.org/r/321653 [13:44:29] (03CR) 10Ema: [C: 031] VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 (owner: 10BBlack) [13:49:45] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:50:10] zeljkof: there is nothing to deploy :] [13:50:21] (03CR) 10Jcrespo: "Isn't there a backups::otrs class? Maybe a directory instead?" [puppet] - 10https://gerrit.wikimedia.org/r/320989 (owner: 10Marostegui) [13:50:23] hashar: yes, just checked [13:50:30] it will be a short swat :) [13:51:45] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [13:55:17] (03PS1) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [13:57:21] (03CR) 10jenkins-bot: [V: 04-1] Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [13:57:43] (03CR) 1020after4: "Note: I'll need a dba to create the mysql account and place the credentials in puppet/private." [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [13:59:31] (03PS2) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [14:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1400). [14:00:32] oh balls, I forgot to add my thing....! [14:00:53] hashar: zeljkof if you wouldn't mind deploying mine that would be great! (i'm mid meeting) [14:01:17] https://gerrit.wikimedia.org/r/#/c/319540/ (adding to the cal now) [14:01:21] addshore: I'm around... [14:01:46] (03CR) 10jenkins-bot: [V: 04-1] Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [14:03:22] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.32 [debs/linux44] - 10https://gerrit.wikimedia.org/r/321653 (owner: 10Muehlenhoff) [14:03:47] wikibugs is still dead from yesterday? [14:03:52] or newly-dead? [14:04:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [14:05:07] zeljkof: hashar added to the calander, if you don't have time I'll slot it in another window! [14:05:50] bblack: At least in the databases channel last time it was seen was yesterday [14:06:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:06:55] RECOVERY - puppet last run on mc1020 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [14:07:04] yeah last time I saw a message from it was ~20h ago [14:07:38] (03PS3) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [14:07:45] addshore: I can deploy it now, can you test it at mw1099? [14:07:54] addshore: not sure I got that change [14:07:55] yup! [14:08:08] you drop it from beta features so that disable it ? [14:08:19] hashar: I'll deploy it, if it looks good to you [14:08:26] hashar: drop it from betafeatures, so it is enabled by default [14:09:02] hashar: the one for betawiki was rolled out this time last week :) [14:09:18] (03PS2) 10Volans: wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) [14:09:32] addshore: so betafeature enabled means it is disabled unless opted in? :D [14:09:37] ahazehazeh [14:09:47] hashar: indeed ;) [14:09:53] !log starting EU SWAT [14:10:07] (03PS2) 10Hashar: Enable RevisionSlider (non BF) on test & mediawiki wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319540 (https://phabricator.wikimedia.org/T149724) (owner: 10Addshore) [14:10:15] go for it zeljkof :] [14:10:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:39] oh my god https://www.mediawiki.org/wiki/Extension:RevisionSlider#/media/File:RevisionSlider_2016_05_31.png [14:10:44] hashar: when you are done, maybe you know why https://gerrit.wikimedia.org/r/#/c/321656/ (release tools) fails? [14:11:19] tox-jessie [14:11:28] hashar: oh, problems, can not log in to gerrit since I have changed my wikitech password :| [14:11:43] hashar, addshore: could you please +2 the commit? [14:12:04] (03CR) 10Addshore: [C: 032] Enable RevisionSlider (non BF) on test & mediawiki wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319540 (https://phabricator.wikimedia.org/T149724) (owner: 10Addshore) [14:12:05] remembers me about "Zak Greant wrote a Greasemonkey script "MediaWiki history Sparklines"" history at https://phabricator.wikimedia.org/T43329 [14:12:36] (03Merged) 10jenkins-bot: Enable RevisionSlider (non BF) on test & mediawiki wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319540 (https://phabricator.wikimedia.org/T149724) (owner: 10Addshore) [14:12:37] zeljkof: ^^ [14:13:08] addshore: thanks! [14:16:39] zeljkof: looks good on 1099! [14:16:42] addshore: 319540 is at mw1099, please test [14:16:47] oh, that was quick :) [14:16:49] ;) [14:16:56] ok, deploying to the universe then... [14:16:58] aude: that is the python linter (flake8). Will fix them and rebase your patch [14:17:05] Brilliant! many thanks! [14:17:39] thanks [14:18:53] aude: should be good now :] [14:18:57] thanks [14:19:06] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:319540|Enable RevisionSlider (non BF) on test & mediawiki wikis (T149724)]] (duration: 00m 57s) [14:19:13] [= [14:19:32] addshore: deployed, please check [14:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:52] T149724: Enable RevisionSlider (non betafeature) on testwiki, test2wiki, mediawikiwiki and testwikidatawiki - https://phabricator.wikimedia.org/T149724 [14:20:44] hm, there was one error [14:20:52] 14:18:55 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/InitialiseSettings.php', 'mw1211.eqiad.wmnet', 'mw1280.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw2119.codfw.wmnet', 'mw2215.codfw.wmnet', 'mw2080.codfw.wmnet', 'mw1201.eqiad.wmnet', 'mw2187.codfw.wmnet', 'mw1216.eqiad.wmnet'] on [14:20:52] labtestweb2001.wikimedia.org returned [255]: Host key verification failed. [14:21:10] https://phabricator.wikimedia.org/P4436 [14:21:16] hashar, addshore: ^ [14:22:02] zeljkof: looks good on the sites! hmm, no idea about that error [14:22:24] addshore: ok, in that case finishing the swat, will check with scap people later [14:22:36] !log EU SWAT finished! [14:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:21] that needs a manual pull on the failed hosts and update the keys [14:25:37] (03PS1) 10Muehlenhoff: Add debdeploy grains for puppetdb hosts [puppet] - 10https://gerrit.wikimedia.org/r/321659 [14:26:19] visitors who may land on pages served by those hosts wont see the change [14:26:51] [14:27:36] I asked andrew about that host as looks like he's been doing stuff on it [14:28:30] (03CR) 10Muehlenhoff: [C: 032] Add debdeploy grains for puppetdb hosts [puppet] - 10https://gerrit.wikimedia.org/r/321659 (owner: 10Muehlenhoff) [14:29:55] (03CR) 10Hashar: [C: 031] "We had that discussion when adding java 8 to the CI slaves ( https://gerrit.wikimedia.org/r/#/c/295880/ ) quoting:" [puppet] - 10https://gerrit.wikimedia.org/r/321398 (owner: 10Chad) [14:30:29] marostegui on all the hosts mentioned in the scap P4436 paste? [14:30:50] aude: https://gerrit.wikimedia.org/r/#/c/321658/ :) [14:31:08] zeljkof: does it break sync ? [14:31:13] arseny92: There is nly one that failed there right?: labtestweb2001.wikimedia.org [14:31:19] (03PS1) 10MarcoAurelio: Set 'abusefilter-modify-global' to stewards locally at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321660 (https://phabricator.wikimedia.org/T150752) [14:31:21] zeljkof: forget labetestweb2001 .. probably something being reinstalled and it is a test [14:31:34] zeljkof: I believe that is a test machine for wikitech.wikimedia.org . Hardly a problem [14:32:12] zeljkof: has nothing to do with scap. It is just that labtestweb2001 is either new or had its ssh host key changed on reinstall [14:32:26] and thus it is not know on tin.eqiad.wmnet /etc/ssh/known_hosts [14:32:35] marostegui pages are served randomly by mw???? hosts [14:32:43] that file is autogenerated by puppet once in a while based on ssh host keys collected on every servers running puppet [14:32:48] so not a problem [14:33:07] arseny92: What I am saying is that in that paste, only one host failed, which is labstestweb2001 if I am reading it right [14:33:37] marostegui: correct [14:34:34] isnt labstestweb2001 just the public endpoint ? [14:34:39] "and it is a test" is not a valid train of thought [14:35:06] we are syncing private information there, so it is as production as any other host [14:35:53] I am talking to Andrew already, he is going to take a look [14:39:40] aude mind CR+2 the python fixup https://gerrit.wikimedia.org/r/#/c/321658/ ? :] [14:40:25] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:40:40] (03PS6) 10Andrew Bogott: openstack: cache mwclient connection in wikistatus [puppet] - 10https://gerrit.wikimedia.org/r/321169 (owner: 10BryanDavis) [14:43:31] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::calico::builder: clone to /srv/calico-containers [puppet] - 10https://gerrit.wikimedia.org/r/321648 (owner: 10Giuseppe Lavagetto) [14:44:18] (03PS3) 10Giuseppe Lavagetto: profile::calico::builder: clone to /srv/calico-containers [puppet] - 10https://gerrit.wikimedia.org/r/321648 [14:44:30] (03CR) 10Giuseppe Lavagetto: [V: 032] profile::calico::builder: clone to /srv/calico-containers [puppet] - 10https://gerrit.wikimedia.org/r/321648 (owner: 10Giuseppe Lavagetto) [14:44:48] apaches from one of the hosts mentioned in the paste [14:44:52] https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Application+servers+eqiad&h=mw1211.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=0.204452&m=ap_cpuload&vl=pct&ti=Pct+of+time+CPU+utilized [14:46:42] (03PS7) 10Andrew Bogott: openstack: cache mwclient connection in wikistatus [puppet] - 10https://gerrit.wikimedia.org/r/321169 (owner: 10BryanDavis) [14:52:25] PROBLEM - puppet last run on mc1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[strace] [14:52:53] (03CR) 10Jgreen: [C: 031] "lgtm but I'm not super familiar with icinga innards" [puppet] - 10https://gerrit.wikimedia.org/r/321642 (https://phabricator.wikimedia.org/T149913) (owner: 10Volans) [14:53:20] arseny92: That server has exactly the same file that was pushed here:https://gerrit.wikimedia.org/r/#/c/319540/2/wmf-config/InitialiseSettings.php (same md5, same content)… [15:02:03] (03PS4) 10Faidon Liambotis: fix reverse names of ulsfo mgmt interfaces [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) (owner: 10Dzahn) [15:02:27] (03CR) 10Faidon Liambotis: [C: 032] fix reverse names of ulsfo mgmt interfaces [dns] - 10https://gerrit.wikimedia.org/r/321581 (https://phabricator.wikimedia.org/T149875) (owner: 10Dzahn) [15:03:43] (03CR) 10Andrew Bogott: [C: 032] openstack: cache mwclient connection in wikistatus [puppet] - 10https://gerrit.wikimedia.org/r/321169 (owner: 10BryanDavis) [15:04:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [15:07:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [15:08:25] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [15:10:53] 4 minutes for jenkins to run the authdns linting [15:10:58] that's ridiculous [15:13:58] [WCsmIApAMEwAALRXc@MAAAAT] 2016-11-15 15:13:41: Fatal exception of type "DBTransactionSizeError" [15:15:31] (03PS3) 10Andrew Bogott: wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 [15:15:33] (03PS1) 10Andrew Bogott: Wikistatus: Fix a bug with page deletion [puppet] - 10https://gerrit.wikimedia.org/r/321666 [15:19:25] RECOVERY - puppet last run on mc1008 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [15:21:05] (03PS4) 10BBlack: VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 [15:21:07] (03PS1) 10BBlack: VCL: synthetic() outputs for >=400 [puppet] - 10https://gerrit.wikimedia.org/r/321669 [15:24:25] PROBLEM - puppet last run on aqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:26:41] (03CR) 10Ottomata: "> In any case, this commit also needs to be split up such that we can deploy the internal LVS service separately from turning on the publi" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [15:26:44] (03PS1) 10Alexandros Kosiaris: Revert "icinga: Specify mode for nagios_host, nagios_service" [puppet] - 10https://gerrit.wikimedia.org/r/321670 [15:26:51] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "icinga: Specify mode for nagios_host, nagios_service" [puppet] - 10https://gerrit.wikimedia.org/r/321670 (owner: 10Alexandros Kosiaris) [15:26:56] (03PS2) 10Alexandros Kosiaris: Revert "icinga: Specify mode for nagios_host, nagios_service" [puppet] - 10https://gerrit.wikimedia.org/r/321670 [15:26:58] (03CR) 10Alexandros Kosiaris: [V: 032] Revert "icinga: Specify mode for nagios_host, nagios_service" [puppet] - 10https://gerrit.wikimedia.org/r/321670 (owner: 10Alexandros Kosiaris) [15:27:17] (03PS3) 10Ottomata: Deploy EventStreams on scb and configure LVS service in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) [15:28:40] (03CR) 10Ottomata: "Also, I'd like to propose a different name for this service than 'eventstreams': How about:" [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [15:29:18] (03CR) 10Andrew Bogott: [C: 032] Wikistatus: Fix a bug with page deletion [puppet] - 10https://gerrit.wikimedia.org/r/321666 (owner: 10Andrew Bogott) [15:29:23] (03PS2) 10Andrew Bogott: Wikistatus: Fix a bug with page deletion [puppet] - 10https://gerrit.wikimedia.org/r/321666 [15:38:10] (03PS1) 10Giuseppe Lavagetto: profile::calico::builder: fix build-calico, add fakeroot [puppet] - 10https://gerrit.wikimedia.org/r/321672 [15:38:12] (03PS4) 10Andrew Bogott: wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 [15:40:30] (03PS5) 10BBlack: VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 [15:40:33] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::calico::builder: fix build-calico, add fakeroot [puppet] - 10https://gerrit.wikimedia.org/r/321672 (owner: 10Giuseppe Lavagetto) [15:40:39] (03CR) 10BBlack: [C: 032 V: 032] VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 (owner: 10BBlack) [15:40:51] (03PS6) 10BBlack: VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 [15:40:53] (03CR) 10BBlack: [V: 032] VCL: fixups for synthetic error status [puppet] - 10https://gerrit.wikimedia.org/r/320975 (owner: 10BBlack) [15:41:02] (03PS2) 10BBlack: VCL: synthetic() outputs for >=400 [puppet] - 10https://gerrit.wikimedia.org/r/321669 [15:41:04] (03CR) 10Andrew Bogott: [C: 032] wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 (owner: 10Andrew Bogott) [15:41:09] (03PS5) 10Andrew Bogott: wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 [15:41:14] (03CR) 10BBlack: [C: 032 V: 032] VCL: synthetic() outputs for >=400 [puppet] - 10https://gerrit.wikimedia.org/r/321669 (owner: 10BBlack) [15:41:23] <_joe_> bblack: ok to merge your changes? [15:41:27] smash buttons and eventually changes go through! [15:41:30] <_joe_> else, just merge mine [15:41:30] _joe_: yes :) [15:41:48] <_joe_> I merged one, actually [15:41:52] ok [15:42:40] (03PS6) 10Andrew Bogott: wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 [15:42:58] (03PS1) 10Ladsgroup: Add German Wiktionary in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321673 [15:48:45] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:49:45] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [15:52:21] !log upgrading cp2015 to varnish 4.1.3-1wm4 and rebooting with linux 4.4.2-3+wmf7 [15:52:25] RECOVERY - puppet last run on aqs1004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [15:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:40] dapatrick: ping [16:04:57] (03PS2) 10Ladsgroup: Add German Wiktionary in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321673 (https://phabricator.wikimedia.org/T150764) [16:06:10] !log rolling cache_maps upgrade to varnish 4.1.3-1wm4 and reboot with linux 4.4.2-3+wmf7 [16:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:09] Hmm another account compromise [16:07:22] https://meta.wikimedia.org/wiki/Special:CentralAuth/Yaron_Koren [16:08:03] legoktm ^ [16:08:09] Reedy ^ [16:09:52] already locked [16:09:56] yeah [16:10:43] Noted, thx [16:10:46] or has fishbowl wikis access on unattached accounts? [16:10:55] (03PS1) 10Andrew Bogott: wikistatus: Replace the max_retries throttle for wiki login [puppet] - 10https://gerrit.wikimedia.org/r/321677 (https://phabricator.wikimedia.org/T150373) [16:12:10] (03PS2) 10Andrew Bogott: wikistatus: Replace the max_retries throttle for wiki login [puppet] - 10https://gerrit.wikimedia.org/r/321677 (https://phabricator.wikimedia.org/T150373) [16:12:52] arseny92, what? [16:13:53] (03CR) 10Andrew Bogott: [C: 032] wikistatus: Replace the max_retries throttle for wiki login [puppet] - 10https://gerrit.wikimedia.org/r/321677 (https://phabricator.wikimedia.org/T150373) (owner: 10Andrew Bogott) [16:15:02] that was a question [16:15:13] arseny92, yeah but I think it was missing some words [16:15:57] well im doing 30things a time [16:16:35] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:24:36] PROBLEM - NTP on cp3004 is CRITICAL: NTP CRITICAL: Offset unknown [16:24:36] (03CR) 10Alexandros Kosiaris: "So, just to make sure I understand you correctly, we are afraid the eventstreams might bring the kafka cluster that powers eventbus+change" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [16:28:52] (03CR) 10Ottomata: Deploy EventStreams on scb and configure LVS service in eqiad (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [16:32:55] (03PS1) 10Alexandros Kosiaris: Update to 1.6.0-2 [debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/321684 [16:34:38] RECOVERY - NTP on cp3004 is OK: NTP OK: Offset 0.0006060600281 secs [16:34:58] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [16:37:28] PROBLEM - NTP on cp2021 is CRITICAL: NTP CRITICAL: Offset unknown [16:42:48] PROBLEM - NTP on cp4012 is CRITICAL: NTP CRITICAL: Offset unknown [16:44:38] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:47:28] RECOVERY - NTP on cp2021 is OK: NTP OK: Offset 0.0004258751869 secs [16:48:48] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:49:48] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [16:50:37] (03PS1) 10Giuseppe Lavagetto: calicoctl: fix typos, missing fields in the control file [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321687 [16:51:11] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] calicoctl: fix typos, missing fields in the control file [calico-containers] (1.0.0-beta-rc5) - 10https://gerrit.wikimedia.org/r/321687 (owner: 10Giuseppe Lavagetto) [16:52:48] RECOVERY - NTP on cp4012 is OK: NTP OK: Offset 0.0006908178329 secs [16:54:26] (03PS1) 10Rush: labstore1001: clean out stale and bad backup jobs [puppet] - 10https://gerrit.wikimedia.org/r/321689 (https://phabricator.wikimedia.org/T127567) [16:57:28] PROBLEM - NTP on cp4020 is CRITICAL: NTP CRITICAL: Offset unknown [16:57:51] is there a bug in phab for "people who are employed for wmf but not in the wmf LDAP group"? [16:59:05] (03CR) 10Dzahn: "can we just put the content of the .htaccess into regular Apache config templates instead? The only use case for these seems to be when a " [puppet] - 10https://gerrit.wikimedia.org/r/321651 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [17:00:04] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1700). Please do the needful. [17:00:08] PROBLEM - check_puppetrun on americium is CRITICAL: CRITICAL: Puppet has 18 failures [17:00:08] PROBLEM - check_puppetrun on frdb1001 is CRITICAL: CRITICAL: Puppet has 12 failures [17:00:48] (03CR) 10Madhuvishy: [C: 031] labstore1001: clean out stale and bad backup jobs [puppet] - 10https://gerrit.wikimedia.org/r/321689 (https://phabricator.wikimedia.org/T127567) (owner: 10Rush) [17:01:02] ^^^ americium/frdb1001 checking... [17:02:03] (03PS2) 10Rush: labstore1001: clean out stale and bad backup jobs [puppet] - 10https://gerrit.wikimedia.org/r/321689 (https://phabricator.wikimedia.org/T127567) [17:03:57] (03CR) 10Dzahn: [C: 031] "no more needed, sure. but doesnt't hurt either to be more compatible for others, but then probably no others are using this?.. oh well.. n" [puppet] - 10https://gerrit.wikimedia.org/r/321650 (https://phabricator.wikimedia.org/T150727) (owner: 10Hashar) [17:05:08] RECOVERY - check_puppetrun on americium is OK: OK: Puppet is currently enabled, last run 62 seconds ago with 0 failures [17:05:08] RECOVERY - check_puppetrun on frdb1001 is OK: OK: Puppet is currently enabled, last run 208 seconds ago with 0 failures [17:05:55] (03PS3) 10Rush: labstore1001: clean up unused jobs and legacy [puppet] - 10https://gerrit.wikimedia.org/r/321689 (https://phabricator.wikimedia.org/T127567) [17:09:12] is wikibugs gone for good, or? [17:10:27] (03CR) 10Rush: [C: 032] labstore1001: clean up unused jobs and legacy [puppet] - 10https://gerrit.wikimedia.org/r/321689 (https://phabricator.wikimedia.org/T127567) (owner: 10Rush) [17:11:50] bblack: may need to be restarted, I'm not actually sure who owns it? [17:12:28] PROBLEM - NTP on cp3006 is CRITICAL: NTP CRITICAL: Offset unknown [17:17:28] RECOVERY - NTP on cp4020 is OK: NTP OK: Offset -1.37090683e-05 secs [17:18:05] (03PS1) 10Rush: labstore: secondary add some monitoring manifests [puppet] - 10https://gerrit.wikimedia.org/r/321690 (https://phabricator.wikimedia.org/T126083) [17:18:08] PROBLEM - NTP on cp1060 is CRITICAL: NTP CRITICAL: Offset unknown [17:19:56] (03CR) 10Rush: [C: 032] labstore: secondary add some monitoring manifests [puppet] - 10https://gerrit.wikimedia.org/r/321690 (https://phabricator.wikimedia.org/T126083) (owner: 10Rush) [17:22:28] RECOVERY - NTP on cp3006 is OK: NTP OK: Offset 0.0007211565971 secs [17:26:41] (03PS6) 10Rush: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 (owner: 10Andrew Bogott) [17:28:11] RECOVERY - NTP on cp1060 is OK: NTP OK: Offset -0.0006133615971 secs [17:36:01] (03PS3) 10Volans: wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) [17:37:01] PROBLEM - traffic-pool service on cp1047 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is failed [17:37:21] PROBLEM - NTP on cp3003 is CRITICAL: NTP CRITICAL: Offset unknown [17:40:38] (03CR) 10Faidon Liambotis: [C: 032] wmf-reimage: fix power cycle if power is off [puppet] - 10https://gerrit.wikimedia.org/r/321633 (https://phabricator.wikimedia.org/T150448) (owner: 10Volans) [17:41:15] (03CR) 10Filippo Giunchedi: prometheus: rename varnish_config to cluster_config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321567 (owner: 10Filippo Giunchedi) [17:41:21] (03PS2) 10Filippo Giunchedi: prometheus: rename varnish_config to cluster_config [puppet] - 10https://gerrit.wikimedia.org/r/321567 [17:41:23] (03PS2) 10Filippo Giunchedi: role: add Prometheus job for memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321568 (https://phabricator.wikimedia.org/T147326) [17:46:01] RECOVERY - traffic-pool service on cp1047 is OK: OK - traffic-pool is active [17:47:21] RECOVERY - NTP on cp3003 is OK: NTP OK: Offset 0.0006252527237 secs [17:47:33] !log rolling cache_misc upgrade to varnish 4.1.3-1wm4 and reboot with linux 4.4.2-3+wmf7 [17:48:01] PROBLEM - NTP on cp1047 is CRITICAL: NTP CRITICAL: Offset unknown [17:53:41] PROBLEM - NTP on cp4019 is CRITICAL: NTP CRITICAL: Offset unknown [17:54:55] (03CR) 10Muehlenhoff: [C: 031] "Seems fine indeed." [puppet] - 10https://gerrit.wikimedia.org/r/321398 (owner: 10Chad) [17:56:03] (03CR) 10Alexandros Kosiaris: Deploy EventStreams on scb and configure LVS service in eqiad (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [17:58:01] RECOVERY - NTP on cp1047 is OK: NTP OK: Offset -0.0001730322838 secs [18:00:01] PROBLEM - IPsec on cp2012 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp1058_v4, cp1058_v6 [18:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1800). [18:00:41] PROBLEM - IPsec on cp4004 is CRITICAL: Strongswan CRITICAL - ok: 26 not-conn: cp2006_v4, cp2006_v6 [18:01:01] RECOVERY - IPsec on cp2012 is OK: Strongswan OK - 36 ESP OK [18:01:15] no parsoid deploy [18:02:41] RECOVERY - IPsec on cp4004 is OK: Strongswan OK - 28 ESP OK [18:03:31] PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 146 not-conn: cp3010_v4, cp3010_v6 [18:03:41] RECOVERY - NTP on cp4019 is OK: NTP OK: Offset 0.000519990921 secs [18:06:21] PROBLEM - IPsec on kafka1018 is CRITICAL: Strongswan CRITICAL - ok: 146 not-conn: cp3010_v4, cp3010_v6 [18:06:21] PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 146 not-conn: cp3010_v4, cp3010_v6 [18:06:21] PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 146 not-conn: cp3010_v4, cp3010_v6 [18:06:21] PROBLEM - IPsec on cp1045 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3010_v4, cp3010_v6 [18:07:51] PROBLEM - NTP on cp1058 is CRITICAL: NTP CRITICAL: Offset unknown [18:09:21] PROBLEM - IPsec on cp2006 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp4002_v4, cp4002_v6 [18:09:41] PROBLEM - IPsec on cp1051 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp4002_v4, cp4002_v6 [18:10:11] PROBLEM - IPsec on cp2018 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp1045_v4, cp1045_v6 [18:10:11] PROBLEM - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp1045_v4, cp1045_v6 [18:10:21] RECOVERY - IPsec on kafka1018 is OK: Strongswan OK - 148 ESP OK [18:10:21] RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 148 ESP OK [18:10:21] RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 148 ESP OK [18:10:31] RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 148 ESP OK [18:10:41] RECOVERY - IPsec on cp1051 is OK: Strongswan OK - 24 ESP OK [18:12:21] PROBLEM - NTP on cp4004 is CRITICAL: NTP CRITICAL: Offset unknown [18:14:11] RECOVERY - IPsec on cp2025 is OK: Strongswan OK - 36 ESP OK [18:14:22] RECOVERY - IPsec on cp2006 is OK: Strongswan OK - 36 ESP OK [18:15:12] <_joe_> !log uploading calico/node:1.0.0-beta-rc5 to the docker registry [18:16:11] RECOVERY - IPsec on cp2018 is OK: Strongswan OK - 36 ESP OK [18:17:32] milimetric: No bug that I know of for getting added to the wmf ldap group. The last known procedure for doing so was "poke ostriches to add them" [18:17:41] PROBLEM - NTP on cp3010 is CRITICAL: NTP CRITICAL: Offset unknown [18:18:30] bd808, milimetric: "milimetric is already a member of the group, skipping." [18:19:03] <_joe_> uhm is logmsgbot not working? [18:19:13] bd808: I love that the official procedure is still "ask until someone thinks of Chad and bugs him" :) [18:19:18] it's probably hanging out with wikibugs on vacation [18:19:21] <_joe_> morebots is not here [18:19:22] (seriously, I don't mind because it's trivial, but it's funny) [18:19:34] they all need restarting [18:19:41] _joe_: stashbot handles !log messages now. I'll check to see what's up with it [18:19:41] PROBLEM - IPsec on cp1051 is CRITICAL: Strongswan CRITICAL - ok: 22 not-conn: cp3009_v4, cp3009_v6 [18:19:47] bblack: In the middle of moving to Canada ;-) [18:19:59] labs did maintainence yesturday, i had to restart grrrit-wm yesturday. [18:20:56] legoktm i am wondering could you restart wikibugs please? [18:21:07] The mw page list you as a person to contact for wikibugs. [18:21:22] _joe_: ^ stashbot is back. you can retry your !log [18:21:48] !log X [18:21:59] jouncebot: next [18:21:59] In 0 hour(s) and 38 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1900) [18:22:06] Eh sure, lez do it [18:22:21] RECOVERY - NTP on cp4004 is OK: NTP OK: Offset -0.00021058321 secs [18:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:49] <_joe_> !log uploading calico/node:1.0.0-beta-rc5 to the docker registry T150434 [18:22:50] !log applying schema change on s6 (page) T69223 [18:22:51] PROBLEM - IPsec on cp1058 is CRITICAL: Strongswan CRITICAL - ok: 20 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:22:56] !log gerrit: bringing down for a minute or two for quick point upgrade, T143089 [18:23:01] PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:01] PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:01] PROBLEM - IPsec on cp2012 is CRITICAL: Strongswan CRITICAL - ok: 32 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:11] PROBLEM - IPsec on cp1061 is CRITICAL: Strongswan CRITICAL - ok: 20 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:11] PROBLEM - IPsec on cp2018 is CRITICAL: Strongswan CRITICAL - ok: 32 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:11] PROBLEM - IPsec on cp2025 is CRITICAL: Strongswan CRITICAL - ok: 32 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:21] PROBLEM - Host cp3009 is DOWN: PING CRITICAL - Packet loss = 100% [18:23:21] PROBLEM - IPsec on kafka1018 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:21] PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:21] PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:21] PROBLEM - IPsec on cp1045 is CRITICAL: Strongswan CRITICAL - ok: 20 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:28] log msgs lag [18:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:31] T150434: Build calico - https://phabricator.wikimedia.org/T150434 [18:23:31] PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 144 not-conn: cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:41] PROBLEM - IPsec on cp2006 is CRITICAL: Strongswan CRITICAL - ok: 30 not-conn: cp1051_v4, cp1051_v6, cp3009_v4, cp3009_v6, cp4001_v4, cp4001_v6 [18:23:46] I think that my fancy template is slowing down logs to the main SAL [18:23:54] that page is pretty huge now [18:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:08] T69223: Schema change for page content language - https://phabricator.wikimedia.org/T69223 [18:24:41] PROBLEM - SSH access on cobalt is CRITICAL: connect to address 208.80.154.81 and port 29418: Connection refused [18:24:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:24:42] T143089: Update gerrit to 2.12.5 - https://phabricator.wikimedia.org/T143089 [18:25:55] <_joe_> !log uploaded calicoctl_1.0.0-beta-rc5~wmf1_amd64.deb to jessie-wikimedia T150434 [18:26:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:36] !log gerrit: back up, running 2.12.5-dirty now :) [18:26:41] RECOVERY - SSH access on cobalt is OK: SSH OK - GerritCodeReview_2.12.5-dirty (SSHD-CORE-0.14.0) (protocol 2.0) [18:26:51] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [18:27:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:21] PROBLEM - NTP on cp2018 is CRITICAL: NTP CRITICAL: Offset unknown [18:27:41] RECOVERY - NTP on cp3010 is OK: NTP OK: Offset 0.0008156895638 secs [18:27:51] RECOVERY - NTP on cp1058 is OK: NTP OK: Offset 0.0001286268234 secs [18:28:21] ostriches should we make this https://phabricator.wikimedia.org/T148876 public? [18:28:26] Or keep it private? [18:28:28] yeah, the load time of https://wikitech.wikimedia.org/wiki/Server_Admin_Log is gross. :/ I'll look into it [18:28:41] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [18:29:01] paladox: I just made it public [18:29:09] Ok thanks :) [18:29:28] Theres a new pref setting [18:29:30] :) [18:29:36] in the edit and diff section [18:30:13] ostriches lets test grrrit-wm [18:30:25] it should have reconnected by now [18:30:29] without anyone touching it [18:30:30] :) [18:31:05] (03CR) 10Paladox: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [18:31:11] Yay it worked ^^ [18:31:20] That makes me so happy. [18:31:30] mutante ^^ [18:31:57] We have line wrapping now. Finnaly you can view larged lined diffs [18:34:49] (03PS1) 10Papaul: Add DHCP entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321704 (https://phabricator.wikimedia.org/T150680) [18:37:22] RECOVERY - NTP on cp2018 is OK: NTP OK: Offset -0.000287771225 secs [18:39:51] PROBLEM - NTP on cp1051 is CRITICAL: NTP CRITICAL: Offset unknown [18:44:31] PROBLEM - NTP on cp3008 is CRITICAL: NTP CRITICAL: Offset unknown [18:49:52] RECOVERY - NTP on cp1051 is OK: NTP OK: Offset 9.235739708e-05 secs [18:50:10] (03PS1) 10Papaul: Add partman entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) [18:53:51] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:54:31] RECOVERY - NTP on cp3008 is OK: NTP OK: Offset 0.0007588267326 secs [18:56:41] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [18:57:46] !log Archived oldest 3 months of SAL data; tried to optimize [[Module:SAL]] to speed up rendering [18:57:57] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2796673 (10fgiunchedi) >>! In T150641#2795261, @Gilles wrote: > Mediawiki compensates for broken thumbnail URLs when there's a redirect: > > ```... [18:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:16] still not awesome [19:00:00] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2796677 (10fgiunchedi) >>! In T150641#2795514, @Gilles wrote: > Mediawiki 404s on an audio-only file where Thumbor 500s. I don't think that's a bi... [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T1900). [19:00:04] Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [19:00:15] 06Operations, 10Gerrit, 07LDAP: Change LDAP cn to something more useful (was Rename "Dzahn" to "Daniel Zahn" in Gerrit) - https://phabricator.wikimedia.org/T113792#2796678 (10demon) [19:00:19] o/ [19:00:36] It's a config in beta, I will run the maintenance script. Just need a +2 [19:01:20] I can SWAT today [19:01:38] thanks [19:02:33] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321673 (https://phabricator.wikimedia.org/T150764) (owner: 10Ladsgroup) [19:03:15] (03Merged) 10jenkins-bot: Add German Wiktionary in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321673 (https://phabricator.wikimedia.org/T150764) (owner: 10Ladsgroup) [19:03:21] PROBLEM - NTP on cp3007 is CRITICAL: NTP CRITICAL: Offset unknown [19:04:48] Amir1: your update should hit beta after the next beta-scap-code-update [19:05:06] yeah, checking it [19:05:08] thanks [19:05:42] !log starting branching for 1.29.0-wmf.3 [19:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:38] Amir1, you know these require more than just config and a maintenance script, right? [19:08:53] Krenair: yes [19:08:54] https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Add_a_wiki [19:09:00] ok [19:09:04] Following steps in this manual [19:13:21] RECOVERY - NTP on cp3007 is OK: NTP OK: Offset 0.0008774995804 secs [19:19:46] 06Operations, 06Discovery, 10Elasticsearch, 10hardware-requests, 06Discovery-Search (Current work): elasticsearch new servers (5x eqiad / 12x codfw) - https://phabricator.wikimedia.org/T149089#2796772 (10RobH) [19:20:16] (03PS1) 10Chad: Hide stupid compiled python from git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321707 [19:20:22] thcipriani: For you ^ [19:20:23] :) [19:20:34] thanks! [19:22:42] (03PS2) 10Chad: Remove $wmgUsabilityPrefSwitch, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321601 [19:23:05] (03PS2) 10Filippo Giunchedi: prometheus: add memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/320702 (https://phabricator.wikimedia.org/T147326) [19:23:22] (03CR) 10Chad: [C: 032] Remove $wmgUsabilityPrefSwitch, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321601 (owner: 10Chad) [19:24:07] (03Merged) 10jenkins-bot: Remove $wmgUsabilityPrefSwitch, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321601 (owner: 10Chad) [19:24:16] 06Operations, 07Puppet, 13Patch-For-Review, 07RfC: RFC: New puppet code organization paradigm/coding standards - https://phabricator.wikimedia.org/T147718#2796780 (10Ottomata) A question emerging from https://gerrit.wikimedia.org/r/#/c/320690/: Should explicit hiera keys that are looked up from a role (ak... [19:24:57] (03PS2) 10Thcipriani: Hide stupid compiled python from git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321707 (owner: 10Chad) [19:25:10] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321707 (owner: 10Chad) [19:25:11] 06Operations, 10ops-eqiad: relabel labsdb1008 to db1095, update racktables - https://phabricator.wikimedia.org/T150793#2796781 (10RobH) [19:25:14] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/320702 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi) [19:25:35] (03PS3) 10Filippo Giunchedi: prometheus: rename varnish_config to cluster_config [puppet] - 10https://gerrit.wikimedia.org/r/321567 [19:25:51] !log demon@tin Synchronized wmf-config/InitialiseSettings-labs.php: for beta, no-op, completeness (duration: 00m 58s) [19:26:04] (03Merged) 10jenkins-bot: Hide stupid compiled python from git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321707 (owner: 10Chad) [19:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:15] 06Operations, 10ops-eqiad, 10hardware-requests, 10netops, 13Patch-For-Review: Move labsdb1008 to production, rename it back to db1095, use it as a temporary sanitarium - https://phabricator.wikimedia.org/T149829#2764445 (10RobH) [19:27:07] (03CR) 10Chad: [C: 04-2] "Not yet, trying to clean up some of this along the way before swapping it all." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309363 (owner: 10Chad) [19:27:31] (03CR) 10Chad: [C: 04-2] "Yeah waiting on puppet change" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317657 (owner: 10Chad) [19:29:20] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: rename varnish_config to cluster_config [puppet] - 10https://gerrit.wikimedia.org/r/321567 (owner: 10Filippo Giunchedi) [19:29:32] thcipriani: swat done? i added a late one (same as yesterday) but can deploy myself [19:29:45] ebernhardson: oh, sorry, I missed it [19:30:27] I can do that one, just give me a sec, checking out php-1.29.0-wmf.3 [19:30:43] thcipriani: k [19:31:14] (03PS3) 10Thcipriani: Revert "Revert "Setup CirrusSearch interwiki load test"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321492 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:32:41] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 640 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3258506 keys, up 15 days 11 hours - replication_delay is 640 [19:34:17] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321492 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:34:55] (03Merged) 10jenkins-bot: Revert "Revert "Setup CirrusSearch interwiki load test"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321492 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:35:06] thcipriani: If any of https://gerrit.wikimedia.org/r/#/q/owner:chadh%2540wikimedia.org+status:open+project:operations/mediawiki-config+-CodeReview:%22-2%22 strike your fancy for review... :) [19:35:22] (03CR) 10Chad: [C: 04-2] "Not yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309601 (owner: 10Chad) [19:35:37] Minus the 3 I put -2 on :) [19:36:14] heh, fair enough [19:36:22] ebernhardson: your change is live on mw1099 [19:37:07] (03PS1) 10RobH: updating db1095 dns entries [dns] - 10https://gerrit.wikimedia.org/r/321711 (https://phabricator.wikimedia.org/T149829) [19:38:17] (03CR) 10RobH: [C: 032] updating db1095 dns entries [dns] - 10https://gerrit.wikimedia.org/r/321711 (https://phabricator.wikimedia.org/T149829) (owner: 10RobH) [19:38:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3243240 keys, up 15 days 11 hours - replication_delay is 0 [19:39:40] (03PS4) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [19:39:47] (03PS2) 10Chad: scap patch: minor pep8/flake8 fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321496 [19:40:16] (03PS1) 10Dereckson: Allow a wiki to use __NOINDEX__ and __INDEX__ in all namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321712 [19:40:18] (03PS1) 10Dereckson: Allow __NOINDEX__ on all namespaces on meta. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321713 (https://phabricator.wikimedia.org/T150245) [19:40:51] (03PS1) 10Volans: add additional information on malformed responses [software/service-checker] - 10https://gerrit.wikimedia.org/r/321714 (https://phabricator.wikimedia.org/T150560) [19:41:34] thcipriani: still checking [19:41:40] ok [19:42:38] (03PS5) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [19:42:40] (03CR) 10Dereckson: "Deployment optimal order: IS first, then CS." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321712 (owner: 10Dereckson) [19:43:58] (03CR) 1020after4: [C: 031] "The supporting changes to phabricator will be deployed tomorrow:" [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [19:46:26] (03PS1) 10RobH: labsdb1008 retasked as db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321715 (https://phabricator.wikimedia.org/T149829) [19:46:59] (03CR) 10RobH: [C: 032] labsdb1008 retasked as db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321715 (https://phabricator.wikimedia.org/T149829) (owner: 10RobH) [19:47:29] thcipriani: i can't seem to trigger any problems, lets ship it [19:47:38] 06Operations, 10ops-eqiad, 10hardware-requests, 10netops, 13Patch-For-Review: Move labsdb1008 to production, rename it back to db1095, use it as a temporary sanitarium - https://phabricator.wikimedia.org/T149829#2796865 (10RobH) [19:48:34] ebernhardson: ok, interwikisources, initialisesettings, cirrussearch-production is how I'll go. [19:48:43] thcipriani: sounds right [19:51:50] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-interwikiSources.php: SWAT: [[gerrit:321492|Revert "Revert "Setup CirrusSearch interwiki load test"" (T149740)]] PART I (duration: 01m 43s) [19:52:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:52:11] T149740: Run load tests of cross-project searching to verify its stability - https://phabricator.wikimedia.org/T149740 [19:53:31] (03CR) 10Ottomata: "> we are afraid the eventstreams might bring the kafka cluster that powers eventbus+changeprop down" [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [19:53:39] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:321492|Revert "Revert "Setup CirrusSearch interwiki load test"" (T149740)]] PART II (duration: 00m 48s) [19:54:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:24] (03CR) 10Chad: [C: 032] scap patch: minor pep8/flake8 fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321496 (owner: 10Chad) [19:54:53] (03Merged) 10jenkins-bot: scap patch: minor pep8/flake8 fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321496 (owner: 10Chad) [19:54:58] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-production.php: SWAT: [[gerrit:321492|Revert "Revert "Setup CirrusSearch interwiki load test"" (T149740)]] PART III (duration: 00m 48s) [19:55:08] ^ ebernhardson live everywhere [19:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:52] thcipriani: thanks! watching logs [19:57:34] ostriches: I identified the one you merged as not at all worrisome and https://gerrit.wikimedia.org/r/#/c/321595/ seems pretty innocuous as well, if you're fine with it rolling today/now. [19:57:48] (03CR) 10Jcrespo: mariadb-labs: Prepare db1095 to be the new sanitarium host (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) (owner: 10Jcrespo) [19:58:05] brief spurt of warnings (still) about undefined variable from a few hosts...i wish i understood how a couple hosts always still have that problem [19:58:18] only right when deploying though [19:58:20] (03PS10) 10Jcrespo: mariadb-labs: Prepare db1095 to be the new sanitarium host [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) [19:58:54] (03PS11) 10Jcrespo: mariadb-labs: Prepare db1095 to be the new sanitarium host [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) [19:59:18] !log demon@tin Synchronized scap/plugins/: pep8 + gitignore, mostly no-op (duration: 00m 49s) [19:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:56] (03CR) 10Chad: [C: 032] query.php: Serve HTTP 410 Gone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321595 (owner: 10Chad) [20:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161115T2000). [20:00:07] (03PS2) 10Chad: query.php: Serve HTTP 410 Gone [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321595 [20:00:36] (03CR) 10Jcrespo: [C: 032] "Manuel: I am going to deploy this to reinstall the servers. Normally, I would wait for your +1, but you were more or less ok with the non-" [puppet] - 10https://gerrit.wikimedia.org/r/320752 (https://phabricator.wikimedia.org/T149829) (owner: 10Jcrespo) [20:00:58] ebernhardson: I'm not really clear on that either. Those warnings seem to happen primarily while we're waiting on errors from the canary nodes, FWIW. Still though, unless a single request is being served from multiple nodes you wouldn't expect it to be an issue :\ [20:02:01] !log demon@tin Synchronized w/query.php: Serve http 410 instead of 500 (duration: 00m 48s) [20:02:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:49] ebernhardson 321492 has unneccessary whitespaces in CirrusSearch-interwikiSources.php at nearly every line [20:03:17] thcipriani: https://gerrit.wikimedia.org/r/#/c/320322/ is probably safe too :) [20:03:18] dear jouncebot it's not me today :P [20:03:30] * thcipriani updates wiki page. [20:03:40] also the file should be symlinked in docroot/conf [20:03:41] arseny92: it was auto-generated, and isn't a permenant thing [20:03:48] https://gerrit.wikimedia.org/r/#/c/317757/ is also harmless, and worth testing :) [20:04:13] (03CR) 1020after4: [C: 031] Rewrite checkoutMediaWiki as scap3 plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317757 (owner: 10Chad) [20:04:27] yeah, scap prep one is interesting, I tested it, seems to work. [20:04:41] not too familiar with static.php, so skipped that one :P [20:04:59] arseny92: that particular file is only for the load test, pushing interwiki search into production dcausse is working on a more complete solution that can calculate/cache to appropriate data at runtime [20:05:05] It's basically moving some stuff around to reduce duplication. We've already loaded all of the MW context, so HttpStatus is available to us [20:05:10] Hard to test, but should Just Work [20:05:22] famous last words [20:06:13] ebernhardson: you could call stylize.php perhaps in your autogeneration process [20:06:18] (03PS2) 10Chad: Remove old 2007 donation PSA page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321471 [20:06:34] ebernhardson: https://github.com/wikimedia/mediawiki-tools-code-utils/blob/master/stylize.php [20:07:12] (03CR) 10Chad: [C: 032] Remove old 2007 donation PSA page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321471 (owner: 10Chad) [20:07:30] (03PS1) 10Filippo Giunchedi: role: account for labs in memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321717 (https://phabricator.wikimedia.org/T147326) [20:07:42] (03Merged) 10jenkins-bot: Remove old 2007 donation PSA page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321471 (owner: 10Chad) [20:09:07] s/autogen/commit process [20:09:35] !log demon@tin Synchronized docroot/foundation/: Removing old 2007 donation stuff, broken (duration: 00m 49s) [20:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:54] (03PS1) 10Chad: Add branch.autosetuprebase = always to my .gitconfig [puppet] - 10https://gerrit.wikimedia.org/r/321718 [20:10:08] Trivial, if someone has a second ^^ [20:10:31] (03PS4) 10Chad: Rewrite checkoutMediaWiki as scap3 plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317757 [20:10:36] (03CR) 10Filippo Giunchedi: [C: 032] role: account for labs in memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321717 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi) [20:10:41] (03PS2) 10Filippo Giunchedi: role: account for labs in memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321717 (https://phabricator.wikimedia.org/T147326) [20:12:02] (03CR) 10Chad: [C: 032] Rewrite checkoutMediaWiki as scap3 plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317757 (owner: 10Chad) [20:13:39] (03PS5) 10Chad: Rewrite checkoutMediaWiki as scap3 plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317757 [20:14:29] 06Operations, 10Beta-Cluster-Infrastructure, 10Thumbor: Thumbor keeps losing Swift auth on beta - https://phabricator.wikimedia.org/T150649#2796944 (10Krenair) does prod also show u'www-authenticate': u'Swift realm="unknown"' ? [20:16:27] !log demon@tin Synchronized scap/plugins/prep.py: More scap goodies for MW (duration: 00m 49s) [20:16:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:09] (03PS1) 10Paladox: Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) [20:23:17] twentyafterfour ^^ [20:29:01] (03PS3) 10Filippo Giunchedi: role: add Prometheus job for memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321568 (https://phabricator.wikimedia.org/T147326) [20:30:56] (03CR) 10Filippo Giunchedi: [C: 032] role: add Prometheus job for memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321568 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi) [20:31:50] (03PS1) 10Thcipriani: Group0 to 1.29.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321722 [20:34:18] !log thcipriani@tin Started scap: testwiki to 1.29.0-wmf.3 and rebuild l10n cache [20:34:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:28] (03PS1) 10EBernhardson: Increase CirrusSearch interwiki load test to 25% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321724 (https://phabricator.wikimedia.org/T149740) [20:41:21] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:41:51] (03PS1) 10Filippo Giunchedi: role: include memcached_exporter in role::memcached [puppet] - 10https://gerrit.wikimedia.org/r/321725 (https://phabricator.wikimedia.org/T147326) [20:42:42] paladox: very nice : [20:43:53] paladox: where is the "pygments dropdown box"? [20:43:56] (03CR) 10Filippo Giunchedi: "Tested in deployment-prep too, works as expected." [puppet] - 10https://gerrit.wikimedia.org/r/321725 (https://phabricator.wikimedia.org/T147326) (owner: 10Filippo Giunchedi) [20:46:18] (03PS1) 10Chad: Standardize most of the docroots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321726 [20:46:36] Fuck yeah ^ [20:50:38] (03PS1) 10Ori.livneh: Re-enable AbuseFilterCachingParser everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321728 [20:52:44] (03PS2) 10Dzahn: Add DHCP entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321704 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [20:54:39] (03CR) 10Dzahn: [C: 032] Add DHCP entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321704 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [20:54:42] can someone look what's happening on wtp1018? CPU usage (avg 94%) and load is very high there, which does not match with the rest of the cluster [20:56:45] (03PS2) 10Dzahn: Add partman entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [20:57:31] we haz prometheus on mc* hosts? [20:58:14] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:00:38] 06Operations, 10ops-eqiad, 10hardware-requests, 10netops, 13Patch-For-Review: Move labsdb1008 to production, rename it back to db1095, use it as a temporary sanitarium - https://phabricator.wikimedia.org/T149829#2797084 (10jcrespo) [21:01:12] 06Operations, 10ops-eqiad, 10hardware-requests, 10netops, 13Patch-For-Review: Move labsdb1008 to production, rename it back to db1095, use it as a temporary sanitarium - https://phabricator.wikimedia.org/T149829#2764445 (10jcrespo) 05Open>03Resolved a:03jcrespo This is done, only pending tasks are... [21:01:54] 06Operations, 06Performance-Team, 10Thumbor: Thumbor 504s on several images Mediawiki renders succesfully - https://phabricator.wikimedia.org/T150746#2797096 (10Gilles) Makes sense. Do we care about those, then? It sounds like Nginx/Thumbor behaves better than Apache in regards to timing out at that level. [21:02:10] (03PS1) 10Chad: Kill skins-1.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321730 [21:02:15] (03PS7) 10Andrew Bogott: Explicitly set up /var/spool/gridengine on grid master [puppet] - 10https://gerrit.wikimedia.org/r/321584 [21:03:08] ah no still need https://gerrit.wikimedia.org/r/#/c/321725 [21:03:20] go godog go :) [21:04:39] !log thcipriani@tin Finished scap: testwiki to 1.29.0-wmf.3 and rebuild l10n cache (duration: 30m 20s) [21:05:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:39] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2797103 (10Gilles) >>! In T150641#2796673, @fgiunchedi wrote: > Could it be in this case though that the _original_ got renamed and that takes pre... [21:06:54] (03PS10) 10Andrew Bogott: Keystone: Limit password auth to certain hosts and users. [puppet] - 10https://gerrit.wikimedia.org/r/320706 (https://phabricator.wikimedia.org/T150092) [21:07:04] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2797114 (10GWicke) @akosiaris, thank you for taking care of etherpad! Overall, the only remaining service that still needs some work is kartotherian, which is tracked in T1... [21:08:00] (03PS1) 10Chad: Remove more ancient unreferenced fundraising cruft [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321733 [21:08:14] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2797121 (10Gilles) >>! In T150641#2796677, @fgiunchedi wrote: > I wonder (as a low priority) if thumbor would have enough knowledge in this case t... [21:09:10] (03CR) 10Andrew Bogott: [C: 032] Keystone: Limit password auth to certain hosts and users. [puppet] - 10https://gerrit.wikimedia.org/r/320706 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:10:24] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:10:41] 06Operations, 06Performance-Team, 10Thumbor: Investigate why oom_kill mtail program doesn't work properly - https://phabricator.wikimedia.org/T149980#2797153 (10Gilles) >>! In T149980#2796457, @fgiunchedi wrote: > I'm going to report this upstream too but I think we should look into alternative solutions too... [21:12:13] (03CR) 10Thcipriani: [C: 032] Group0 to 1.29.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321722 (owner: 10Thcipriani) [21:12:42] (03Merged) 10jenkins-bot: Group0 to 1.29.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321722 (owner: 10Thcipriani) [21:12:55] (03PS6) 10Andrew Bogott: Check password/ip whitelist for wmtotp. [puppet] - 10https://gerrit.wikimedia.org/r/320791 (https://phabricator.wikimedia.org/T150092) [21:13:22] (03PS1) 10Jcrespo: Add new labs (sanitarium) host db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321735 (https://phabricator.wikimedia.org/T150802) [21:13:50] (03PS2) 10Jcrespo: prometheus-mysql-exporter: Add new labs (sanitarium) host db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321735 (https://phabricator.wikimedia.org/T150802) [21:14:59] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.29.0-wmf.3 [21:15:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:54] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:16:10] (03PS3) 10Jcrespo: prometheus-mysql-exporter: Add new labs (sanitarium) host db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321735 (https://phabricator.wikimedia.org/T150802) [21:16:26] (03CR) 10Andrew Bogott: [C: 032] Check password/ip whitelist for wmtotp. [puppet] - 10https://gerrit.wikimedia.org/r/320791 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:16:54] (03PS4) 10Jcrespo: prometheus-mysql-exporter: Add new labs (sanitarium) host db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321735 (https://phabricator.wikimedia.org/T150802) [21:17:09] (03PS3) 10Dzahn: Add partman entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [21:18:17] (03CR) 10Jcrespo: [C: 032] prometheus-mysql-exporter: Add new labs (sanitarium) host db1095 [puppet] - 10https://gerrit.wikimedia.org/r/321735 (https://phabricator.wikimedia.org/T150802) (owner: 10Jcrespo) [21:19:25] (03PS4) 10Dzahn: Add partman entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [21:21:40] (03CR) 10Dzahn: [C: 032] "@papaul i slightly changed it and added a new line for 201[0-2] so that it doesn't conflict with the existing 200[1-6]" [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [21:21:57] (03PS5) 10Dzahn: Add partman entries for restbase201[0-2] Bug:T150680 [puppet] - 10https://gerrit.wikimedia.org/r/321706 (https://phabricator.wikimedia.org/T150680) (owner: 10Papaul) [21:23:34] PROBLEM - puppet last run on mc1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:23:44] PROBLEM - MD RAID on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:25:34] RECOVERY - MD RAID on thumbor1002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [21:26:14] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:28:14] (03PS7) 10Andrew Bogott: Keystone: open up firewall for public keystone API [puppet] - 10https://gerrit.wikimedia.org/r/320787 (https://phabricator.wikimedia.org/T150092) [21:31:54] thcipriani: can you please create the changelog ? [21:32:04] matanya: yup, doing [21:37:06] ostriches hi there seems to be alot of usage at https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=cobalt.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS_%7C_network [21:37:08] then normal [21:37:16] matanya: should be up-to-date now, sorry for the delay [21:37:25] thanks thcipriani [21:37:56] paladox: i see nothing wrong [21:37:58] alot of cpu usage [21:38:05] ok [21:38:26] 06Operations, 06Labs, 10Labs-Infrastructure, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797271 (10Andrew) [21:38:55] just a few spikes, prolly train stuff [21:38:58] I'm not worried [21:39:07] Oh [21:39:08] ok [21:42:30] 06Operations, 06Labs, 10Labs-Infrastructure, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797284 (10Andrew) [21:43:48] 06Operations, 06Labs, 10Labs-Infrastructure, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2773828 (10Andrew) [21:44:54] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:44:57] 06Operations, 06Labs, 10Labs-Infrastructure, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797286 (10Andrew) [21:45:26] 06Operations, 10Parsoid: Parsoid deb upload failed: Need ops intervention - https://phabricator.wikimedia.org/T150674#2797287 (10ssastry) 05Open>03Resolved a:03ssastry All good now. [21:46:01] 06Operations, 10Parsoid: Parsoid deb upload failed: Need ops intervention - https://phabricator.wikimedia.org/T150674#2797291 (10Dzahn) :) [21:47:27] (03PS6) 10MarcoAurelio: Rename 'technican' and 'technician' to 'interface-editor' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308281 (https://phabricator.wikimedia.org/T144638) [21:47:38] (03CR) 10jenkins-bot: [V: 04-1] Rename 'technican' and 'technician' to 'interface-editor' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308281 (https://phabricator.wikimedia.org/T144638) (owner: 10MarcoAurelio) [21:50:37] (03PS2) 10Andrew Bogott: Keystone: remove explicit observer rights [puppet] - 10https://gerrit.wikimedia.org/r/320825 (https://phabricator.wikimedia.org/T150092) [21:51:34] RECOVERY - puppet last run on mc1011 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:53:01] (03CR) 10Andrew Bogott: [C: 032] Keystone: remove explicit observer rights [puppet] - 10https://gerrit.wikimedia.org/r/320825 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:53:46] (03CR) 10Andrew Bogott: [C: 032] Make compute:get fully public [puppet] - 10https://gerrit.wikimedia.org/r/320827 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:54:16] (03PS2) 10Andrew Bogott: Keystone: Make the project list public [puppet] - 10https://gerrit.wikimedia.org/r/320826 (https://phabricator.wikimedia.org/T150092) [21:56:19] (03PS2) 1020after4: Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [21:56:23] (03CR) 10Andrew Bogott: [C: 032] Keystone: Make the project list public [puppet] - 10https://gerrit.wikimedia.org/r/320826 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:57:10] (03PS3) 1020after4: Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [21:59:12] (03PS2) 10Andrew Bogott: Make compute:get fully public [puppet] - 10https://gerrit.wikimedia.org/r/320827 (https://phabricator.wikimedia.org/T150092) [21:59:56] (03PS4) 1020after4: Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:00:16] (03CR) 1020after4: [C: 031] Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:00:37] (03CR) 1020after4: "After my edits this should put the settings in agreement with what's currently live in production." [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:00:59] twentyafterfour ^^ thanks [22:01:01] (03PS1) 10Rush: gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 [22:02:15] (03PS2) 10Rush: gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 [22:03:49] (03CR) 10jenkins-bot: [V: 04-1] gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 (owner: 10Rush) [22:05:38] (03PS3) 10Rush: gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 [22:07:15] (03PS4) 10Rush: gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 [22:12:04] (03PS7) 10MarcoAurelio: Rename 'technican' and 'technician' to 'interface-editor' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308281 (https://phabricator.wikimedia.org/T144638) [22:20:43] (03PS1) 10MarcoAurelio: Allow 'interface-editor' users to use OATHAuth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321797 (https://phabricator.wikimedia.org/T150807) [22:22:45] (03CR) 10Dzahn: Enable multiple config files in phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [22:23:49] (03PS5) 10Dzahn: Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:24:49] (03CR) 1020after4: Enable multiple config files in phabricator (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) [22:32:55] marostegui: hi there [22:33:48] FYI, i am about to do a rename for a user with 56k edits [22:34:09] if anyone wants to object, now would be a good time [22:34:34] (03CR) 10Dzahn: [C: 032] Phabricator: Add a few more languages to pygments dropdown box [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:34:45] mutante ^^ thanks :) [22:35:21] matanya: mar*stegui is likely not around, he's in CET timezone ;) [22:35:49] thanks volans [22:35:50] (03CR) 1020after4: "thanks dzahn!" [puppet] - 10https://gerrit.wikimedia.org/r/321721 (https://phabricator.wikimedia.org/T147980) (owner: 10Paladox) [22:36:22] I am starting now, i can log the action, if one wants [22:37:21] !log renaming Веденей to Serzh Ignashevich - user with +50k edits [22:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:37:51] paladox: applied on iridium [22:38:22] Thanks [22:38:23] :) [22:38:37] ah cool, it asks me to confirm a sysadmin is around, nice work legoktm [22:39:41] matanya question how is a rename related to -operations just curious [22:40:14] Zppix: it queue all user actions and changes the name [22:40:26] it is a DB and mediawiki intensive request [22:40:34] (03PS9) 10MarcoAurelio: Remove upload rights on wikis where local uploads are disabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306443 (https://phabricator.wikimedia.org/T143789) [22:40:38] (03PS5) 10Rush: gridengine: refactor of init.pp for toollabs module [puppet] - 10https://gerrit.wikimedia.org/r/321786 [22:40:43] (03PS6) 1020after4: Enable multiple config files in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/321654 (https://phabricator.wikimedia.org/T146055) [22:41:24] RECOVERY - HTTPS-blog on blog.wikimedia.org is OK: SSL OK - Certificate blog.wikimedia.org valid until 2017-02-13 21:41:00 +0000 (expires in 89 days) [22:45:24] (03PS10) 10MarcoAurelio: Remove upload rights on wikis where local uploads are disabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306443 (https://phabricator.wikimedia.org/T143789) [22:48:56] jouncebot: next [22:48:56] In 1 hour(s) and 11 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161116T0000) [22:49:18] Dereckson: I might be able to attend that ^^ window if not full of patches [22:50:16] aocram: k [22:50:36] aocram: perhaps add engineer too for ru.? [22:51:22] Dereckson: do they have the editinterface right? I can't remember [22:51:30] * aocram checks [22:51:39] aocram what kind of change is it [22:52:27] oh yep, sure thing adding [22:55:47] (03PS1) 10BBlack: gerrit: smaller upload limits [puppet] - 10https://gerrit.wikimedia.org/r/321799 [22:56:14] (03PS2) 10BBlack: gerrit: smaller upload limits [puppet] - 10https://gerrit.wikimedia.org/r/321799 [22:57:32] (03CR) 10BBlack: [C: 032 V: 032] "ostriches virtual +1" [puppet] - 10https://gerrit.wikimedia.org/r/321799 (owner: 10BBlack) [22:59:34] (03PS2) 10MarcoAurelio: Allow 'interface-editor' & 'engineer' users to use OATHAuth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321797 (https://phabricator.wikimedia.org/T150807) [22:59:59] ^^ wont that affect gerrit? [23:00:09] gerrit at 503 [23:00:14] lowering objects down to 20mb from 100mb [23:00:24] MarcoAurelio there restarting gerrit [23:00:30] for https://gerrit.wikimedia.org/r/321799 [23:01:04] PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [23:01:33] ah, k :) [23:01:33] (03CR) 10Paladox: "Wont this affect gerrit? Since that is an 80m decrease from 100m." [puppet] - 10https://gerrit.wikimedia.org/r/321799 (owner: 10BBlack) [23:02:04] paladox: Affecting things is the point of making changes to config files :P [23:02:21] Oh i mean wont it affect it in bad ways [23:02:30] since an 80m decrease is quite large [23:02:44] expecially as objects take alot of space [23:02:44] Only if you're trying to upload very large files [23:03:01] Oh, but doint we upload large files? [23:03:10] If you were to make a big change. [23:03:24] Not really. There's not that many legit cases of uploading huge files [23:03:35] Oh [23:03:46] It doesn't affect stuff already uploaded...just new uploads. [23:03:56] ok [23:03:59] If you run into a problem or know anyone who does, file a bug and if there's a legit reason to need to upload stuff that big the restriction will probably be reverted [23:04:07] ok [23:04:19] (03PS3) 10MarcoAurelio: Allow 'interface-editor' & 'engineer' users to use OATHAuth [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321797 (https://phabricator.wikimedia.org/T150807) [23:11:03] 06Operations, 05Prometheus-metrics-monitoring: Port redis statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T148637#2797501 (10fgiunchedi) I've tried running the above exporter in labs and mc2001, results for the latter are in P4446. Note the `addr` label is superfluous and can be disca... [23:11:54] (03PS2) 10MarcoAurelio: Set 'abusefilter-modify-global' to stewards locally at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321660 (https://phabricator.wikimedia.org/T150752) [23:14:31] (03CR) 10Brian Wolff: [C: 031] "I agree. Editinterface is one of the most sensitive rights we have." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321797 (https://phabricator.wikimedia.org/T150807) (owner: 10MarcoAurelio) [23:15:17] Thanks bawolff [23:15:54] TIL: there's a group called engineers [23:16:00] * bawolff not particularly a fan of that group name [23:16:15] me neither, we've got much discussion on Phab about that [23:16:42] MarcoAurelio: On the bug you said you were going to sign up for the swat window? [23:17:04] I proposed to call them 'interface-editors' as everywhere else (*) but they said the name was very confusing, etc. [23:17:21] yep, I'm on it, sorry for the delay [23:17:28] because engineers isn't confusing ;) [23:17:44] MarcoAurelio: No worries, I was just asking because if you weren't I was going to [23:17:56] are you logged into Wikitech? [23:18:08] well, I should stop being lazy ;) [23:18:14] yeah [23:19:33] (03CR) 10Alex Monk: "Also:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321797 (https://phabricator.wikimedia.org/T150807) (owner: 10MarcoAurelio) [23:19:50] bawolff, oh it gets much worse than 'engineers' [23:20:21] o_O those as well? [23:21:05] and about technican and technician, if https://gerrit.wikimedia.org/r/#/c/308281/ is merged then we've got two less groups to take care of [23:21:21] !log restbase201[0-2] OS install [23:21:22] foundationwiki already has 2FA enabled for everyone. And I think donatewiki too [23:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:51] all private and fishbowl wikis have it enabled [23:23:56] bawolff: added to the calendar [23:24:06] :) [23:30:04] RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [23:32:04] https://donate.wikimedia.org/wiki/Special:ListGroupRights says that ['*']['oathauth-enable'] = true; [23:32:27] had to trick the wiki a bit because it always shows a donation campaign ;) [23:38:34] (03PS3) 10Dzahn: add mapped IPv6 address for krypton [puppet] - 10https://gerrit.wikimedia.org/r/316041 [23:40:13] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to stat1003, stat1002 and fluorine for chelsyx - https://phabricator.wikimedia.org/T142648#2797633 (10debt) [23:45:32] (03PS1) 10Filippo Giunchedi: role: add external_labels to ops prometheus [puppet] - 10https://gerrit.wikimedia.org/r/321813 (https://phabricator.wikimedia.org/T150486) [23:45:34] (03PS1) 10Filippo Giunchedi: role: add prometheus 'global' instance [puppet] - 10https://gerrit.wikimedia.org/r/321814 (https://phabricator.wikimedia.org/T150486) [23:47:25] (03CR) 10jenkins-bot: [V: 04-1] role: add prometheus 'global' instance [puppet] - 10https://gerrit.wikimedia.org/r/321814 (https://phabricator.wikimedia.org/T150486) (owner: 10Filippo Giunchedi) [23:52:16] (03CR) 10Dzahn: [C: 032] add mapped IPv6 address for krypton [puppet] - 10https://gerrit.wikimedia.org/r/316041 (owner: 10Dzahn)