[00:00:04] Will have to wait for stuff to expire [00:00:16] Yeah, 5 min tops [00:00:27] Use XWD [00:00:54] XWD> [00:01:06] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [00:01:10] Chrome extension [00:01:13] flip the switch :) [00:07:55] 06Operations, 10MediaWiki-General-or-Unknown, 07HHVM, 05MW-1.28-release-notes, and 2 others: HHVM: segfault when serializing/unserializing large preprocessor cache items - https://phabricator.wikimedia.org/T73486#2497478 (10Danny_B) [00:13:48] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06WMF-NDA-Requests: NDA-Request Jonas Kress - https://phabricator.wikimedia.org/T140911#2497532 (10Addshore) >>! In T140911#2496812, @Gehel wrote: > It would probably be easier for @Jonas to have direct access to the nginx logs on the wdqs server... [00:19:16] Seems the error has faded now. Thanks :) [00:22:57] (03PS2) 10Dzahn: restbase-test: setup rsync for data from cassandra-test [puppet] - 10https://gerrit.wikimedia.org/r/301303 [00:29:10] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/301303 (owner: 10Dzahn) [00:29:48] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3492/" [puppet] - 10https://gerrit.wikimedia.org/r/301303 (owner: 10Dzahn) [00:39:37] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [00:41:46] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5066708 keys - replication_delay is 0 [00:49:07] PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [00:53:12] !log restbase-test2001-2003 - test rsyncing, create temp data dir. mkdir -p $(grep path /etc/rsync.d/frag-parsoid-html | cut -d= -f2) [00:53:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:56:45] !log xenon - rsync cassandra-test data to restbase-test2001 /srv/backups/eqiad/ [00:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:17:45] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06WMF-NDA-Requests: NDA-Request Jonas Kress - https://phabricator.wikimedia.org/T140911#2497627 (10Dzahn) I am calling this ticket resolved because the NDA part is completed. @Jonas If you think you need actual shell access please open another t... [01:18:33] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06WMF-NDA-Requests: NDA-Request Jonas Kress - https://phabricator.wikimedia.org/T140911#2497628 (10Dzahn) 05Open>03Resolved [01:20:27] PROBLEM - MariaDB Slave Lag: m3 on db1043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1218.35 seconds [01:24:50] (03PS2) 10Dzahn: Add ppchelko to the deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/300523 (https://phabricator.wikimedia.org/T141086) (owner: 10Mobrovac) [01:26:22] (03CR) 10Dzahn: [C: 032] "approved by gwicke. was mentioned in ops meeting." [puppet] - 10https://gerrit.wikimedia.org/r/300523 (https://phabricator.wikimedia.org/T141086) (owner: 10Mobrovac) [01:28:57] 06Operations, 10Ops-Access-Requests, 06Services, 13Patch-For-Review, 03Scap3: Allow Pchelolo to deploy services via Scap3 - https://phabricator.wikimedia.org/T141086#2497635 (10Dzahn) 05Open>03Resolved a:03Dzahn done [01:29:23] 06Operations, 10Ops-Access-Requests, 06Services, 03Scap3: Allow Pchelolo to deploy services via Scap3 - https://phabricator.wikimedia.org/T141086#2497638 (10Dzahn) [01:30:16] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [01:31:11] 06Operations, 10Ops-Access-Requests: Platonides access to #mediawiki_security - https://phabricator.wikimedia.org/T140288#2459280 (10Dzahn) @Platonides ^ [01:31:32] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Add marktraceur to statistics-privatedata-users for access to stat1002 - https://phabricator.wikimedia.org/T140132#2454200 (10Dzahn) Manager might be on vacation. Maybe we can get a director to ack it or expedite it another way. [01:33:24] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Add marktraceur to statistics-privatedata-users for access to stat1002 - https://phabricator.wikimedia.org/T140132#2497662 (10Dzahn) @Nuria can you maybe approve this as analytics manager? [01:35:27] (03PS2) 10Dzahn: Revert "Gerrit: Run list_reviewer_counts cron as root" [puppet] - 10https://gerrit.wikimedia.org/r/300818 [01:37:36] (03CR) 10Dzahn: [C: 032] "per comments on the original change" [puppet] - 10https://gerrit.wikimedia.org/r/300818 (owner: 10Dzahn) [01:41:41] !log ytterbium - shutdown -h now, over and out [01:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:45:07] RECOVERY - MariaDB Slave Lag: m3 on db1043 is OK: OK slave_sql_lag Replication lag: 0.65 seconds [01:46:45] (03PS2) 10Dzahn: Gerrit: Remove bugzilla password, unused since 4eva [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [01:46:57] (03CR) 10Dzahn: [C: 032] Gerrit: Remove bugzilla password, unused since 4eva [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [01:48:42] (03PS1) 10Dzahn: contint: remove firewall rule for ytterbium [puppet] - 10https://gerrit.wikimedia.org/r/301322 [01:50:19] (03CR) 10Dzahn: "has a dependency "submit incl parents"" [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [01:53:27] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497685 (10Dzahn) 18:47 < mutante> !log ytterbium - shutdown -h now, over and out 18:54 < grrrit-wm> (PS1) Dzahn: contint: remove... [01:53:41] (03PS2) 10Dzahn: contint: remove firewall rule for ytterbium [puppet] - 10https://gerrit.wikimedia.org/r/301322 (https://phabricator.wikimedia.org/T125018) [01:57:10] !log lead removed reviewer_count job from root's crontab [01:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:00:32] (03CR) 10Dzahn: "manually removed from root's crontab and confirmed i could run it as gerrit2 user without permission problem." [puppet] - 10https://gerrit.wikimedia.org/r/300818 (owner: 10Dzahn) [02:02:02] (03PS3) 10Dzahn: Gerrit: Remove bugzilla password, unused since 4eva [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [02:02:28] (03CR) 10Dzahn: [C: 032] "ytterbium is down" [puppet] - 10https://gerrit.wikimedia.org/r/301322 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [02:03:52] (03PS4) 10Dzahn: Gerrit: Remove bugzilla password, unused since 4eva [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [02:05:51] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/300931 (owner: 10Chad) [02:07:28] gerrit restart, yes also doing the bot [02:08:02] over [02:09:25] !log restarted grrrit-wm after removing bugzilla password from gerrit [02:09:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:12:25] (03Abandoned) 10Dzahn: ldap: Fix typo [puppet] - 10https://gerrit.wikimedia.org/r/301199 (owner: 10Yuvipanda) [02:16:22] (03PS1) 10Dzahn: remove ytterbium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/301324 (https://phabricator.wikimedia.org/T12518) [02:17:36] (03PS2) 10Dzahn: remove ytterbium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/301324 (https://phabricator.wikimedia.org/T12518) [02:19:23] (03CR) 10Dzahn: [C: 032] remove ytterbium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/301324 (https://phabricator.wikimedia.org/T12518) (owner: 10Dzahn) [02:20:38] mutante, https://phabricator.wikimedia.org/T12518#2497710 - wrong task? [02:21:47] 06Operations, 10Gerrit, 06Release-Engineering-Team: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#2497714 (10Dzahn) [02:22:06] 06Operations, 10ops-eqiad: decom ytterbium (datacenter) - https://phabricator.wikimedia.org/T141415#2497729 (10Dzahn) [02:22:20] Krenair: oops, yes, wrong task [02:22:52] 125 0 18 [02:22:55] is the right one [02:23:29] i cant delete the comment since it's not my own but the bot's [02:24:57] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#1972391 (10Dzahn) removed ytterbium from DNS in https://gerrit.wikimedia.org/r/#/c/301324/ [02:25:45] 06Operations, 13Patch-For-Review, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2497783 (10Dzahn) [02:25:47] 06Operations, 10Gerrit, 13Patch-For-Review: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2497782 (10Dzahn) [02:25:52] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497780 (10Dzahn) 05Open>03Resolved no more remnants in puppet or DNS, except mgmt DNS, continued in subtask now [02:26:23] 06Operations, 10Gerrit, 06Release-Engineering-Team: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2497785 (10Dzahn) [02:27:19] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.11) (duration: 09m 11s) [02:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:40] (03PS3) 10Dzahn: chromium: Ubuntu and Debian compatibility (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/300491 [02:28:40] 06Operations, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2497806 (10Dzahn) [02:28:57] 06Operations, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#1936645 (10Dzahn) ytterbium shut down for real, Total count: 16 [02:29:00] mutante, usual practice is to make a new comment saying the above was a mistake [02:29:14] Krenair: i did [02:29:22] ok [02:31:23] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003.eqiad.wmnet for WMDE-jand - https://phabricator.wikimedia.org/T141339#2497818 (10Dzahn) p:05Triage>03Normal [02:33:20] 06Operations, 10MediaWiki-Releasing, 10Parsoid, 06Release-Engineering-Team: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2497820 (10Dzahn) p:05Triage>03High [02:34:54] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2497823 (10Dzahn) p:05Triage>03Normal [02:35:38] 06Operations, 10Analytics: stat1004 doesn't show up in ganglia - https://phabricator.wikimedia.org/T141360#2495588 (10Dzahn) a:03Dzahn [02:37:58] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2497829 (10Dzahn) [02:39:04] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2491652 (10Dzahn) ``` [terbium:~] $ which sql /usr/local/bin/sql [terbium:~] $ cat /usr/local/bin/sql #!/bin/bash # This file is managed by Puppet (modules/scap/files/sql). ``` [02:39:38] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: puppet fail [02:40:20] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2497832 (10Dzahn) ``` # Look up MySQL host to connect to. For centralauth the host cannot # be determined this way, so we need to use fawiki instead as it is # located on the same server in both pro... [02:59:54] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.12) (duration: 15m 29s) [02:59:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:05:56] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [03:06:43] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jul 27 03:06:43 UTC 2016 (duration 6m 49s) [03:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:02] (03PS1) 10Dzahn: amire80 .bashrc, add alias for sql host lookup [puppet] - 10https://gerrit.wikimedia.org/r/301326 (https://phabricator.wikimedia.org/T141255) [03:08:33] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2497839 (10Dzahn) @Amire80 on terbium, in your home dir in .bashrc , add this code: https://gerrit.wikimedia.org/r/#/c/301326/1/modules/admin/files/home/amire80/.bashrc then "source .bashrc" and th... [03:09:46] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2497852 (10Dzahn) [terbium:~] $ vi .bashrc [terbium:~] $ source .bashrc dzahn@terbium:~$ sqlhost dewiki db1092 [03:10:40] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2497854 (10Dzahn) p:05Triage>03Normal [03:48:27] RECOVERY - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is OK: TCP OK - 0.036 second response time on port 9042 [03:58:18] PROBLEM - puppet last run on elastic2014 is CRITICAL: CRITICAL: puppet fail [04:26:27] RECOVERY - puppet last run on elastic2014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:04:36] (03PS4) 10Tim Starling: Add Html5Depurate module and role [puppet] - 10https://gerrit.wikimedia.org/r/301062 [05:05:54] (03CR) 10Tim Starling: [C: 032] Add Html5Depurate module and role [puppet] - 10https://gerrit.wikimedia.org/r/301062 (owner: 10Tim Starling) [05:06:14] (03CR) 10Tim Starling: "Merging for testing in labs." [puppet] - 10https://gerrit.wikimedia.org/r/301062 (owner: 10Tim Starling) [05:08:47] (03PS1) 10Chad: Deployment master: Make sure that none of MediaWiki got taken by a root [puppet] - 10https://gerrit.wikimedia.org/r/301327 [05:09:47] (03CR) 10jenkins-bot: [V: 04-1] Deployment master: Make sure that none of MediaWiki got taken by a root [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [05:15:30] (03PS2) 10Chad: Deployment master: Make sure that none of MediaWiki got taken by a root [puppet] - 10https://gerrit.wikimedia.org/r/301327 [06:10:34] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2498003 (10Amire80) Thanks, but the point is that I already made such tricks for my own account, and it would be useful to have the same for other accounts. I have several scripts that I run there re... [06:13:54] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/301309 (https://phabricator.wikimedia.org/T136957) (owner: 10GWicke) [06:17:39] <_joe_> moritzm: isn't stdout=journal the default already? [06:17:53] <_joe_> I don't think we need that [06:20:38] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 20 probes of 244 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [06:26:37] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 18 probes of 244 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [06:29:06] ah indeed, you're right [06:30:44] (03CR) 10Giuseppe Lavagetto: [C: 031] Remove old trusty scalers from conftool-data and dsh [puppet] - 10https://gerrit.wikimedia.org/r/301138 (https://phabricator.wikimedia.org/T141352) (owner: 10Muehlenhoff) [06:31:06] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:37] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:07] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:17] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:27] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:37] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 3 failures [06:33:08] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:51:04] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2491652 (10jcrespo) > if I need to connect to a host from another script in a way that doesn't involve running the mysql command as such. > I have several scripts that I run there regularly Could y... [06:56:06] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [06:56:17] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:56:27] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:56:28] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:56:53] !log dropping tables from m4 shard T141407 [06:56:55] T141407: Drop EventLogging tables for ImageMetricsLoadingTime and ImageMetricsCorsSupport - https://phabricator.wikimedia.org/T141407 [06:56:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:57:08] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:57:16] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:57:18] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2498032 (10Amire80) >>! In T141255#2498028, @jcrespo wrote: >> if I need to connect to a host from another script in a way that doesn't involve running the mysql command as such. > >> I have several... [06:57:56] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:47] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:09:42] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2498051 (10jcrespo) > https://gerrit.wikimedia.org/r/#/c/282312/3/bash/published_despite_errors.py That looks like something you shouldn't run from terbium (but more analytics-research-y), but from... [07:11:12] !log installing perl security updates in esams and codfw [07:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:14:21] 06Operations, 10DBA: Separate host lookup from the sql shell script - https://phabricator.wikimedia.org/T141255#2498063 (10jcrespo) Answer your original question, these are the easy dns entries that you should probably be using, which makes things easier, and generally points to the right hosts on each occasio... [07:16:13] (03CR) 10Giuseppe Lavagetto: [C: 031] Replace manually-maintained bastiononly group with the new 'all-users' [puppet] - 10https://gerrit.wikimedia.org/r/301149 (https://phabricator.wikimedia.org/T114161) (owner: 10Alex Monk) [07:29:29] (03PS2) 10Elukey: Add the permissions_validity_in_ms among the configurable parameters [puppet] - 10https://gerrit.wikimedia.org/r/301083 (https://phabricator.wikimedia.org/T140869) [07:37:14] (03CR) 10Giuseppe Lavagetto: "It is already the default to send stdout and stderr to the journal and I am not sure this setting will do what you want it to." [puppet] - 10https://gerrit.wikimedia.org/r/301309 (https://phabricator.wikimedia.org/T136957) (owner: 10GWicke) [07:38:35] (03PS1) 10Jcrespo: Swithchover via dns dbproxy1001 (m1-master) to dbproxy1006 [dns] - 10https://gerrit.wikimedia.org/r/301330 (https://phabricator.wikimedia.org/T125027) [07:41:55] (03PS3) 10Giuseppe Lavagetto: Parsoid: clean up the manifests and files [puppet] - 10https://gerrit.wikimedia.org/r/300067 (https://phabricator.wikimedia.org/T90668) (owner: 10Mobrovac) [07:43:13] (03CR) 10Giuseppe Lavagetto: [C: 031] Swithchover via dns dbproxy1001 (m1-master) to dbproxy1006 [dns] - 10https://gerrit.wikimedia.org/r/301330 (https://phabricator.wikimedia.org/T125027) (owner: 10Jcrespo) [07:44:27] (03CR) 10Giuseppe Lavagetto: [C: 032] Parsoid: clean up the manifests and files [puppet] - 10https://gerrit.wikimedia.org/r/300067 (https://phabricator.wikimedia.org/T90668) (owner: 10Mobrovac) [07:44:49] I've tested bacular, etherpad, puppet, racktables with the new proxy, I think we are good to go [07:45:16] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 40 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:45:38] <_joe_> jynus: ok let me take a quick look [07:46:26] as this is not a db failover, we can leave the old connections ongoing for as much as we want [07:46:44] <_joe_> I assumed so [07:47:16] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 156959 MB (3% inode=99%) [07:47:24] <_joe_> jeez [07:47:31] <_joe_> ok let's go jynus [07:48:13] (03CR) 10Jcrespo: [C: 032] Swithchover via dns dbproxy1001 (m1-master) to dbproxy1006 [dns] - 10https://gerrit.wikimedia.org/r/301330 (https://phabricator.wikimedia.org/T125027) (owner: 10Jcrespo) [07:49:18] at some point this should be really frequent, but as we have not done many, you never know what app layer can do [07:49:50] !log update m1-master to point to dbproxy1006 [07:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:50:17] the 3 dns servers are now updated [07:50:49] (03PS4) 10Elukey: Create the group eventbus-admins [puppet] - 10https://gerrit.wikimedia.org/r/300860 (https://phabricator.wikimedia.org/T141013) [07:50:53] <_joe_> still no mysql connections through 1006 [07:51:15] I was checking first availability [07:51:16] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:51:34] <_joe_> now i see those [07:51:49] all services I checked seem ok [07:51:54] I will now check connection routes [07:51:58] from processlist [07:52:36] <_joe_> etherpad has probably a persistent connection [07:52:52] <_joe_> also, we have to wait at least 5 mins [07:52:56] yes [07:53:11] I would wait, unless there is a problem [07:53:19] and then evaluate restart some services [07:53:25] (etherpad is a no brainer) [07:53:39] I would be more worried about bacula, for example [07:54:32] <_joe_> strontium and palladium are slowly migrating [07:54:41] <_joe_> basically whenever an apache worker dies [07:56:54] (03CR) 10Elukey: "Daniel I saw your conversation with Brandon and Ori about this change, maybe we can discuss the journalctl * wildcard? It doesn't seem to " [puppet] - 10https://gerrit.wikimedia.org/r/300860 (https://phabricator.wikimedia.org/T141013) (owner: 10Elukey) [08:04:38] 06Operations, 10Wikimedia-Apache-configuration: Apache mod_status metrics only available in ganglia - https://phabricator.wikimedia.org/T141424#2498147 (10elukey) [08:04:47] 06Operations, 10Wikimedia-Apache-configuration: Apache mod_status metrics only available in ganglia - https://phabricator.wikimedia.org/T141424#2498160 (10elukey) p:05Triage>03Low [08:06:06] so I think the ones that could cause issues in an emergency are: bacula, puppet (for being too slow) rt and librenms; only the first 2 I think are critical in an emergency [08:06:30] and etherpadlite, of course [08:08:04] <_joe_> elukey: let me merge a couple changes and we'll talk [08:08:11] <_joe_> about your change [08:08:32] _joe_ sure [08:10:27] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [08:12:09] <_joe_> jynus: we have a quite high number of dberrors due to [08:12:15] <_joe_> "Wikibase\Repo\Store\WikiPageEntityStore::updateWatchlist: Automatic transaction with writes in progress (from DatabaseBase::query (LinkCache::addLinkObj)), performing implicit commit!" [08:12:27] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [08:12:30] <_joe_> are you aware of this? [08:13:17] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 47 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [08:13:55] _joe_, yes [08:14:04] it is not a fatal error [08:14:09] but it showed on fatal errors [08:14:17] because aarong wanted it fixed [08:14:26] (which I agree) [08:14:46] but it is not like a new thing, it has been there for ages and now it is log spam [08:15:16] I has spikes because it is used by some api call [08:15:20] *it [08:15:46] but there is nothing we can do from infrastructure, but wait [08:16:04] there is a ticket already, and the relevant paries are aware of it [08:17:03] <_joe_> so wait a sec [08:17:05] https://phabricator.wikimedia.org/T140955 [08:17:07] <_joe_> I spent 5 minutes [08:17:21] <_joe_> because someone wants a bug fixed from developers? [08:17:23] <_joe_> wow. [08:17:33] <_joe_> ok let me disable the mediawiki exceptions alarm then [08:17:54] I agree it is something that should be fixed [08:18:03] <_joe_> no my point is [08:18:08] I do not think put it on errors was a bad move [08:18:17] <_joe_> putting it into fatalmonitor is wrong [08:18:27] I also spent some time on it until I discovered it [08:18:32] <_joe_> it creates a culture of escalation and false positives [08:18:42] he is not wrong, it is not a good practice [08:18:54] I am divided on this, as you can see [08:19:17] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 400 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [08:19:22] my largest issue is it can hide other more important issues [08:19:38] <_joe_> that too [08:20:04] but then I talked to myself, and I answered myself: change your kibana to -"implicit commit" [08:20:58] (03PS1) 10Muehlenhoff: Lower loglevel for resourceloader to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 [08:21:10] so, in practical terms, the answer to : are you aware of X db errors, usually the answer is yes [08:21:31] but then the next thing I do is to paste on phabricator and know if it has been reported already [08:21:32] (03CR) 10Giuseppe Lavagetto: [C: 031] Lower loglevel for resourceloader to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 (owner: 10Muehlenhoff) [08:22:30] (03PS4) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [08:23:46] s/I do not think put it on errors was a bad move/do not think put it on errors was a good move/ [08:26:09] (03CR) 10Hashar: [C: 032] Lower loglevel for resourceloader to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 (owner: 10Muehlenhoff) [08:26:32] _joe_: jynus we can move that back to debug level [08:26:37] (03Merged) 10jenkins-bot: Lower loglevel for resourceloader to info [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 (owner: 10Muehlenhoff) [08:26:49] hashar, I am going to comment on the relevant ticket [08:26:54] I mean the " performing implicit commit!" message [08:27:03] I do not want to do anything without asking first [08:27:08] I guess aaron point was to raise attention to that issue so that people notice [08:27:26] would have been nicer to fix all of them before upping the log level though [08:29:16] hashar, https://phabricator.wikimedia.org/T140955#2498179 [08:30:09] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Lower loglevel for resourceloader to info https://gerrit.wikimedia.org/r/#/c/301336/ (duration: 00m 26s) [08:30:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:30:25] (03CR) 10Hashar: "Deployed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 (owner: 10Muehlenhoff) [08:33:43] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353#2498181 (10elukey) Will keep https://etherpad.wikimedia.org/p/appservers-decom updated while I decom servers. [08:33:48] RECOVERY - Disk space on fluorine is OK: DISK OK [08:37:27] (03PS3) 10Elukey: Add the permissions_validity_in_ms among the configurable parameters [puppet] - 10https://gerrit.wikimedia.org/r/301083 (https://phabricator.wikimedia.org/T140869) [08:43:55] !log Decomissioning mw1018-25 (T139353) [08:43:56] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [08:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:47:39] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Dumps-Generation, and 2 others: Link "current" to last dump set on cirrussearch get a 404 - https://phabricator.wikimedia.org/T138176#2498201 (10ArielGlenn) 05Open>03Resolved Verified as fixed after latest run which completed yesterday. Closing. [08:48:04] @time [08:48:05] Time now (various TZ): Jul 27, 2016 8:48AM UTC | 4:48AM EDT |10:48AM CEST | 6:48PM AEST [08:51:24] nice I didn't know this one! [08:51:49] elukey: hehe, I requested it a while back, forgot about it and just saw someone use it in another channel :) [08:52:34] (03CR) 10Giuseppe Lavagetto: [C: 032] service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [08:52:39] (03PS1) 10Hashar: Stop logging xff from 127.0.0.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301339 (https://phabricator.wikimedia.org/T129982) [08:53:56] 06Operations, 10Wikimedia-Site-requests, 07Wikimedia-log-errors: Requests to localhost spam the 'localhost' and 'xff' log buckets - https://phabricator.wikimedia.org/T129982#2498214 (10hashar) a:03hashar That came up again today with fluorine.eqiad.wmnet filling its disk (T141426). From https://gerrit.wi... [08:54:21] (03CR) 10Hashar: "A task has been filled T141426" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301336 (owner: 10Muehlenhoff) [08:57:25] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I just realized this would apply to trustys too, and we have at least parsoid running on trusty." [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [08:57:49] (03CR) 10Hashar: "The Nov 2013 commit has been added by Roan he might remember the context for logging the XFF and detected protocol." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301339 (https://phabricator.wikimedia.org/T129982) (owner: 10Hashar) [09:02:31] 06Operations, 10MediaWiki-Releasing, 10Parsoid, 06Release-Engineering-Team: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2498232 (10fgiunchedi) sigh, thanks for letting us know! Looks like a good occasion to switch to 4k pgp key too, I'm going to generate a new one... [09:03:48] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498236 (10hashar) [09:03:50] 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498233 (10hashar) 05Open>03Resolved a:03hashar Imho there is nothing left to do. All services got transitioned :-} [09:05:56] 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498240 (10hashar) [09:06:24] 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#973092 (10hashar) ::beta related roles as of July 27th: ``` ./modules/role/manifests/beta/availability_collector.pp ./modules/role/manifests/beta/bastion.pp .... [09:07:02] 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#2498242 (10hashar) [09:07:21] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498243 (10hashar) [09:07:27] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#973295 (10hashar) As of July 27th 2016 in puppet.git: ``` $ find . -type f -path '*role*beta*' ./modules/role/manifests/beta/availability_collector.pp ./modu... [09:08:03] 07Puppet, 10Beta-Cluster-Infrastructure, 07Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#2498250 (10hashar) [09:14:37] (03PS1) 10Elukey: Remove references of mw1018->1025 decom appservers [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) [09:15:58] (03PS3) 10Yuvipanda: Add domain labtestspice.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/301177 (https://phabricator.wikimedia.org/T130806) [09:16:18] (03CR) 10Hashar: "Changed bug from T141399 which has been marked as a dupe of T130806" [dns] - 10https://gerrit.wikimedia.org/r/301177 (https://phabricator.wikimedia.org/T130806) (owner: 10Yuvipanda) [09:17:17] (03PS2) 10Andrew Bogott: Set up spice-based remote consoles for Labs instances [puppet] - 10https://gerrit.wikimedia.org/r/301294 (https://phabricator.wikimedia.org/T130806) [09:18:04] (03CR) 10Hashar: "Changed bug from T141399 which has been marked as a dupe of T130806" [puppet] - 10https://gerrit.wikimedia.org/r/301294 (https://phabricator.wikimedia.org/T130806) (owner: 10Andrew Bogott) [09:24:18] (03CR) 10Ema: "recheck" [debs/python-diamond] - 10https://gerrit.wikimedia.org/r/296380 (https://phabricator.wikimedia.org/T138758) (owner: 10Ema) [09:26:35] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: puppet fail [09:30:53] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A couple of small nits but also: did you check hiera (specifically the hosts directory and regex.yaml) to see if they needed removal/updat" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) (owner: 10Elukey) [09:42:38] (03PS2) 10Elukey: Remove references of mw1018->1025 decom appservers [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) [09:42:42] 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2498273 (10Aklapper) >>! In T111815#2440812, @MoritzMuehlenhoff wrote: > Sounds good to me. I'll run some tests with "--unlimited" next week and if all if fine, I'... [09:45:20] (03CR) 10Thiemo Mättig (WMDE): "Test: https://phabricator.wikimedia.org/T12345" [puppet] - 10https://gerrit.wikimedia.org/r/242237 (owner: 10Daniel Kinzler) [09:49:40] (03CR) 10Elukey: "Fixed Joe's comments and also checked hiera, but I don't find any reference of the hosts." [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) (owner: 10Elukey) [09:49:52] (03PS1) 10Filippo Giunchedi: releases: update public keyring [puppet] - 10https://gerrit.wikimedia.org/r/301346 (https://phabricator.wikimedia.org/T141400) [09:53:25] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:01:28] (03PS2) 10Filippo Giunchedi: releases: update public keyring [puppet] - 10https://gerrit.wikimedia.org/r/301346 (https://phabricator.wikimedia.org/T141400) [10:01:34] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] releases: update public keyring [puppet] - 10https://gerrit.wikimedia.org/r/301346 (https://phabricator.wikimedia.org/T141400) (owner: 10Filippo Giunchedi) [10:03:10] 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2498315 (10MoritzMuehlenhoff) not yet, but should be able to have a look at this end of the week or next week [10:08:22] (03CR) 10Thiemo Mättig (WMDE): [C: 04-1] "So we obviously still need this. Unfortunately the regex in this patch removes characters. Please abandon this patch and merge I2f06f93 in" [puppet] - 10https://gerrit.wikimedia.org/r/242237 (owner: 10Daniel Kinzler) [10:10:16] 06Operations, 10MediaWiki-Releasing, 10Parsoid, 06Release-Engineering-Team: debian signing keyid E84AFDD2 has expired - https://phabricator.wikimedia.org/T141400#2498327 (10fgiunchedi) the new key is this: ``` pub 4096R/22250DD7 2016-07-27 [expires: 2019-06-12] Key fingerprint = A6FD 76E2 A61C 556... [10:10:23] (03CR) 10Paladox: [C: 031] node deletion delay is now configurable [debs/nodepool] (patch-queue/debian) - 10https://gerrit.wikimedia.org/r/252953 (owner: 10Hashar) [10:12:27] (03PS1) 10Jcrespo: Puppetize servermon m1 database user [puppet] - 10https://gerrit.wikimedia.org/r/301350 [10:14:14] (03PS2) 10Jcrespo: Puppetize servermon m1 database user [puppet] - 10https://gerrit.wikimedia.org/r/301350 [10:15:57] (03PS3) 10Jcrespo: Puppetize servermon m1 database user [puppet] - 10https://gerrit.wikimedia.org/r/301350 [10:16:19] (03PS4) 10Jcrespo: Puppetize servermon m1 database user [puppet] - 10https://gerrit.wikimedia.org/r/301350 [10:16:56] (03CR) 10Hashar: "recheck" [debs/contenttranslation/cg3] - 10https://gerrit.wikimedia.org/r/293485 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [10:18:05] (03CR) 10Jcrespo: [C: 032] Puppetize servermon m1 database user [puppet] - 10https://gerrit.wikimedia.org/r/301350 (owner: 10Jcrespo) [10:20:24] !log restarting slapd on serpens [10:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:32:28] (03PS5) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [10:33:29] (03CR) 10jenkins-bot: [V: 04-1] service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [10:35:21] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Extension-setup, and 4 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2447489 (10Esanders) This is a really bad idea. This task should be blocked on T138057. It was one thi... [10:39:12] (03PS1) 10Jcrespo: Fix missing include; fix wrong positioning or prometheus account [puppet] - 10https://gerrit.wikimedia.org/r/301353 [10:39:24] (03PS2) 10Jcrespo: Fix missing include; fix wrong positioning or prometheus account [puppet] - 10https://gerrit.wikimedia.org/r/301353 [10:40:34] (03CR) 10Jcrespo: [C: 032] Fix missing include; fix wrong positioning or prometheus account [puppet] - 10https://gerrit.wikimedia.org/r/301353 (owner: 10Jcrespo) [10:41:54] (03PS6) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [10:42:55] !log add extra grants to db1016 and all of m1 for servermon [10:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:48:58] 06Operations, 10DBA, 13Patch-For-Review: upgrade dbproxy1001/1002 to jessie - https://phabricator.wikimedia.org/T125027#2498391 (10jcrespo) All clients are using now dbproxy1006; except bacula. I will wait for backup jobs to finish to restart it before reimaging dbproxy1001. [10:49:58] (03PS7) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [10:52:53] (03PS8) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [11:01:24] (03PS9) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [11:01:49] (03PS10) 10Giuseppe Lavagetto: service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [11:04:31] (03CR) 10Giuseppe Lavagetto: [C: 032] service::node: Output std out/err to a file [puppet] - 10https://gerrit.wikimedia.org/r/299000 (https://phabricator.wikimedia.org/T137878) (owner: 10Mobrovac) [11:19:33] (03PS2) 10Addshore: beta wgEchoMentionStatusNotifications default true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301098 (https://phabricator.wikimedia.org/T140234) [11:22:17] (03CR) 10Addshore: "Schedules for Morning SWAT today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301098 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [11:23:41] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-mobrovac: Allow *-admin groups to see systemd logs for their units - https://phabricator.wikimedia.org/T137878#2498472 (10Joe) 05Open>03Resolved [11:24:59] !log Run initSiteStats to update statistics count on ast.wikipedia (T141432) [11:25:01] T141432: Update statistics count on ast.wikipedia - https://phabricator.wikimedia.org/T141432 [11:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:29:27] does somone know whats happen here (bug?), likely dereckson --> https://phabricator.wikimedia.org/T141431 [11:32:05] Steinsplitter: I look at that later in the afternoon [11:32:09] Steinsplitter: Any idea when it broke? [11:32:11] Roughly [11:32:15] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [11:32:31] a few weeks ago it was still possible to grant. [11:32:37] https://github.com/wikimedia/mediawiki-extensions-UploadWizard/commit/5ec53862c324ebdf37becc8e2b053d680a7dafce [11:32:42] MatmaRex committed 21 days ago [11:32:48] I suspect that is a starting point [11:33:05] converting to extension registration may have screwed with the permissions [11:34:22] Last change was at 05:05, 28 May 2016 [11:34:36] (as far i can see) [11:38:16] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 16 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [12:04:59] !log disable puppet on netmon1001, debugging servermon [12:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:18:52] (03PS1) 10Alexandros Kosiaris: Followup fix for 450a2e6 [puppet] - 10https://gerrit.wikimedia.org/r/301359 [12:20:26] (03PS1) 10BBlack: cache_misc: fix Authz->pass [puppet] - 10https://gerrit.wikimedia.org/r/301360 (https://phabricator.wikimedia.org/T141430) [12:21:19] (03CR) 10BBlack: [C: 032 V: 032] cache_misc: fix Authz->pass [puppet] - 10https://gerrit.wikimedia.org/r/301360 (https://phabricator.wikimedia.org/T141430) (owner: 10BBlack) [12:22:51] !log puppet enabled on net1001 [12:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:23:01] !log puppet enabled on netmon1001 (correction of previous log line) [12:23:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:28:24] !log starting wipe of cache_misc caches [12:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:31:45] 06Operations, 10Analytics, 10Analytics-Cluster: Audit Hadoop worker memory usage. - https://phabricator.wikimedia.org/T118501#2498581 (10elukey) 05Open>03Resolved We have taken several steps to improve the situation during these months: 1) better monitoring of the Hadoop JVMs - https://grafana.wikimedia... [12:33:59] 06Operations, 10Analytics: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2498590 (10elukey) [12:40:35] 06Operations, 10Analytics-Cluster: Install hadoop-lzo on cluster - https://phabricator.wikimedia.org/T89290#1032492 (10elukey) @Ottomata still worth doing this or not? [12:43:33] (03CR) 10Alexandros Kosiaris: [C: 032] Change-Prop: Increase maximum concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/301305 (owner: 10Ppchelko) [12:43:40] !log restarted Jenkins for some trivial plugins updates [12:43:40] (03PS2) 10Alexandros Kosiaris: Change-Prop: Increase maximum concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/301305 (owner: 10Ppchelko) [12:43:43] (03CR) 10Alexandros Kosiaris: [V: 032] Change-Prop: Increase maximum concurrency for ORES [puppet] - 10https://gerrit.wikimedia.org/r/301305 (owner: 10Ppchelko) [12:44:15] (03CR) 10Filippo Giunchedi: [C: 031] Add the permissions_validity_in_ms among the configurable parameters [puppet] - 10https://gerrit.wikimedia.org/r/301083 (https://phabricator.wikimedia.org/T140869) (owner: 10Elukey) [12:50:55] !log disabling puppet on restbase*, aqs* and maps* as extra careful step for https://gerrit.wikimedia.org/r/301083 (no-op but better safe than sorry) [12:50:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:51:11] (03PS4) 10Elukey: Add the permissions_validity_in_ms among the configurable parameters [puppet] - 10https://gerrit.wikimedia.org/r/301083 (https://phabricator.wikimedia.org/T140869) [12:53:36] (03CR) 10Elukey: [C: 032] Add the permissions_validity_in_ms among the configurable parameters [puppet] - 10https://gerrit.wikimedia.org/r/301083 (https://phabricator.wikimedia.org/T140869) (owner: 10Elukey) [12:55:35] godog: --^ starting with aqs testing nodes, then maps testing, then restbase testing [12:56:14] elukey: kk, sounds good! [12:56:51] (03PS2) 10Filippo Giunchedi: add thumbor service IPs [dns] - 10https://gerrit.wikimedia.org/r/300240 (https://phabricator.wikimedia.org/T139606) [12:57:00] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] add thumbor service IPs [dns] - 10https://gerrit.wikimedia.org/r/300240 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [13:01:38] (03PS7) 10Filippo Giunchedi: puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) [13:01:40] (03PS2) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [13:04:22] godog: no op on all the nodes + restbase1010.eqiad.wmnet [13:04:24] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] debian/statsdlb@.service: multi-instance support [software/statsdlb] - 10https://gerrit.wikimedia.org/r/297281 (owner: 10Filippo Giunchedi) [13:04:33] I am going to re-enable puppet [13:05:55] elukey: the config gets changed though? [13:09:30] godog: nope since I haven't changed the default 2000 value with this code review, I'll do it only for AQS in hiera [13:09:39] it was only to add the configurable value [13:10:09] elukey: ah ok, yeah that makes sense [13:11:31] thanks for the support :) [13:16:45] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: puppet fail [13:19:07] <_joe_> that's expected ^^ [13:22:17] (03PS1) 10Elukey: Increase AQS default auth permission caching to 30s. [puppet] - 10https://gerrit.wikimedia.org/r/301365 (https://phabricator.wikimedia.org/T140869) [13:26:26] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/3501/" [puppet] - 10https://gerrit.wikimedia.org/r/301365 (https://phabricator.wikimedia.org/T140869) (owner: 10Elukey) [13:29:43] !log Restart Cassandra on aqs100[123] to apply the latest configuration (T140869) [13:29:44] T140869: Investigate why cassandra per-article-daily oozie jobs fail regularly - https://phabricator.wikimedia.org/T140869 [13:29:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:32:20] (03CR) 10Alexandros Kosiaris: [C: 032] Followup fix for 450a2e6 [puppet] - 10https://gerrit.wikimedia.org/r/301359 (owner: 10Alexandros Kosiaris) [13:32:25] (03PS2) 10Alexandros Kosiaris: Followup fix for 450a2e6 [puppet] - 10https://gerrit.wikimedia.org/r/301359 [13:32:28] (03CR) 10Alexandros Kosiaris: [V: 032] Followup fix for 450a2e6 [puppet] - 10https://gerrit.wikimedia.org/r/301359 (owner: 10Alexandros Kosiaris) [13:33:34] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures [13:41:09] (03PS1) 10BBlack: cache_misc: avoid caching authcookie reqs as well [puppet] - 10https://gerrit.wikimedia.org/r/301367 [13:42:56] (03CR) 10BBlack: [C: 032] cache_misc: avoid caching authcookie reqs as well [puppet] - 10https://gerrit.wikimedia.org/r/301367 (owner: 10BBlack) [13:45:25] PROBLEM - puppetmaster backend https on rhodium is CRITICAL: Connection refused [13:45:36] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: Puppet has 1 failures [13:50:35] PROBLEM - Disk space on krypton is CRITICAL: DISK CRITICAL - /var/spool/exim4/scan is not accessible: Permission denied [13:52:18] (03CR) 10Alexandros Kosiaris: "I agree with Andrew here. At least in the ldaplist being useful part. ldapsearch, while a very powerful tool, has the drawback that one ne" [puppet] - 10https://gerrit.wikimedia.org/r/295475 (owner: 10Alexandros Kosiaris) [13:52:36] RECOVERY - Disk space on krypton is OK: DISK OK [13:56:06] (03PS1) 10BBlack: cache_misc: exclude GeoIP cookie from pass as well [puppet] - 10https://gerrit.wikimedia.org/r/301368 [13:56:29] (03CR) 10BBlack: [C: 032 V: 032] cache_misc: exclude GeoIP cookie from pass as well [puppet] - 10https://gerrit.wikimedia.org/r/301368 (owner: 10BBlack) [13:58:04] 06Operations, 10ops-eqiad, 10Analytics-Cluster: kafka1013 hardware crash - https://phabricator.wikimedia.org/T135557#2498758 (10elukey) 05Open>03Resolved The nf_conntrack issue has been tracked in https://phabricator.wikimedia.org/T136094 We haven't seen any more recurrences of this strange issue and we... [13:58:05] (03CR) 10Alexandros Kosiaris: "There is on part I don't agree with Andrew btw. It's the ldaplist should be able to sprew a full dump of things. That approach is clearly " [puppet] - 10https://gerrit.wikimedia.org/r/295475 (owner: 10Alexandros Kosiaris) [14:00:14] PROBLEM - check_puppetrun on beryllium is CRITICAL: CRITICAL: Puppet has 26 failures [14:01:02] 06Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2498762 (10Cmjohnson) [14:01:04] 06Operations, 10ops-eqiad, 10hardware-requests: eqiad: audit cisco servers for return/decom - https://phabricator.wikimedia.org/T140786#2498760 (10Cmjohnson) 05Open>03Resolved The CISCO's have been audited and we have 21 on-site, 3 still in use. All S/N's updated. [14:05:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [14:05:14] PROBLEM - check_puppetrun on beryllium is CRITICAL: CRITICAL: Puppet has 26 failures [14:06:42] (03PS1) 10EBernhardson: Turn on textcat based language detection for search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301369 [14:06:51] 06Operations, 10Analytics-Cluster, 06Analytics-Kanban, 13Patch-For-Review: Build 0.8.2.1 Kafka package and upgrade Kafka brokers - https://phabricator.wikimedia.org/T106581#2498770 (10Ottomata) [14:07:31] 06Operations, 10Analytics-Cluster: Install hadoop-lzo on cluster - https://phabricator.wikimedia.org/T89290#2498771 (10Ottomata) 05Open>03declined I think not! Unless someone asks for it specifically. Declined. [14:08:48] 06Operations, 10Analytics, 10Analytics-Cluster: Audit Hadoop worker memory usage. - https://phabricator.wikimedia.org/T118501#2498774 (10Ottomata) I think we should close this one too. I think the work you did to up the heap size probably addressed this issue. If we see it again we can reopen. Ja? [14:10:14] RECOVERY - check_puppetrun on beryllium is OK: OK: Puppet is currently enabled, last run 128 seconds ago with 0 failures [14:10:14] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:12:12] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2005-a.codfw.wmnet) [14:12:14] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [14:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:12:24] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [14:14:44] (03CR) 10Alexandros Kosiaris: [C: 032] etherpad: move role to module, rename to ::server [puppet] - 10https://gerrit.wikimedia.org/r/298909 (owner: 10Dzahn) [14:14:49] (03PS2) 10Alexandros Kosiaris: etherpad: move role to module, rename to ::server [puppet] - 10https://gerrit.wikimedia.org/r/298909 (owner: 10Dzahn) [14:14:53] (03CR) 10Alexandros Kosiaris: [V: 032] etherpad: move role to module, rename to ::server [puppet] - 10https://gerrit.wikimedia.org/r/298909 (owner: 10Dzahn) [14:16:44] !log T134016: Cancelling bootstrap of restbase2005-c.codfw.wmnet [14:16:45] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [14:16:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:17:13] subbu, bd808, cscott, legoktm, you are among the project admins for project 'wikitextexp.' If you are no longer involved in the project, please let me know and I'll remove you as admin. If you /are/ involved in the project, please update https://wikitech.wikimedia.org/wiki/Purge_2016#wikitextexp [14:17:30] and also please subscribe to labs-l so that you can receive and respond to announcements in the future [14:17:51] wikitextexp -- is that a project for wikitext experiments? [14:18:19] subbu: do you know anything about this? [14:18:26] PROBLEM - cassandra-a CQL 10.192.48.46:9042 on restbase2005 is CRITICAL: Connection refused [14:18:57] got it ^^^ [14:20:25] ACKNOWLEDGEMENT - cassandra-a CQL 10.192.48.46:9042 on restbase2005 is CRITICAL: Connection refused eevans Long running startup (see T141110) - The acknowledgement expires at: 2016-07-27 15:19:55. [14:23:11] PROBLEM - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is CRITICAL: Connection refused [14:23:12] these too ^^^ [14:23:47] ACKNOWLEDGEMENT - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is CRITICAL: Connection refused eevans Bootstrapping - The acknowledgement expires at: 2016-07-29 14:23:30. [14:24:05] PROBLEM - cassandra-c service on restbase2005 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [14:24:46] PROBLEM - puppet last run on db2003 is CRITICAL: CRITICAL: puppet fail [14:24:49] ACKNOWLEDGEMENT - cassandra-c service on restbase2005 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed eevans Pending start of bootstrap - The acknowledgement expires at: 2016-07-27 17:24:20. [14:27:00] (03PS1) 10Bartosz Dziewoński: Add 'upwizcampeditors' to $wgAddGroups, $wgRemoveGroups for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301371 (https://phabricator.wikimedia.org/T141431) [14:31:20] (03PS1) 10MarcoAurelio: IP cap lift for wiki meeting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301372 (https://phabricator.wikimedia.org/T141421) [14:34:45] (03CR) 10Giuseppe Lavagetto: [C: 031] Remove references of mw1018->1025 decom appservers [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) (owner: 10Elukey) [14:35:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [14:35:28] (03PS3) 10Elukey: Remove references of mw1018->1025 decom appservers [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) [14:36:17] 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2498877 (10Cmjohnson) [14:36:19] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2498876 (10Cmjohnson) 05Resolved>03Open [14:36:34] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2469924 (10Cmjohnson) Closed this by mistake ....supposed to close ms-be1021 [14:37:31] (03PS1) 10BBlack: cache_misc: remove varnish3 VCL compat [puppet] - 10https://gerrit.wikimedia.org/r/301373 (https://phabricator.wikimedia.org/T131501) [14:37:33] (03PS1) 10BBlack: Revert "Revert "Revert "cache_misc: do not deliver expired cached objects""" [puppet] - 10https://gerrit.wikimedia.org/r/301374 (https://phabricator.wikimedia.org/T134989) [14:37:52] (03CR) 10Elukey: [C: 032] Remove references of mw1018->1025 decom appservers [puppet] - 10https://gerrit.wikimedia.org/r/301342 (https://phabricator.wikimedia.org/T139353) (owner: 10Elukey) [14:40:14] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [14:40:18] (03PS2) 10MarcoAurelio: IP cap lift for Wikipedia Edit-a-thon on 2016-08-03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301372 (https://phabricator.wikimedia.org/T141421) [14:40:22] grrr. puppet-- [14:43:02] (03PS1) 10BBlack: Revert "Revert "ssl_ciphersuite: drop non-FS AES256 options"" [puppet] - 10https://gerrit.wikimedia.org/r/301375 [14:43:25] (03CR) 10BBlack: [C: 032 V: 032] Revert "Revert "ssl_ciphersuite: drop non-FS AES256 options"" [puppet] - 10https://gerrit.wikimedia.org/r/301375 (owner: 10BBlack) [14:43:43] (03PS2) 10BBlack: Revert "Revert "ssl_ciphersuite: drop non-FS AES256 options"" [puppet] - 10https://gerrit.wikimedia.org/r/301375 [14:43:47] (03CR) 10BBlack: [V: 032] Revert "Revert "ssl_ciphersuite: drop non-FS AES256 options"" [puppet] - 10https://gerrit.wikimedia.org/r/301375 (owner: 10BBlack) [14:45:14] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 185 seconds ago with 0 failures [14:46:27] 06Operations, 06Commons: Please fix broken thumbnails - https://phabricator.wikimedia.org/T140536#2498914 (10Raymond) Resaving (i.e. via Gimp) and reuploading the files repairs the thumbnails. For testing purposes see now https://test.wikipedia.org/wiki/File:CQ_%E2%80%93_Cologne_Contemporary_Ukulele_Ensemble_... [14:46:53] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2498916 (10Cmjohnson) New case opened w/HP Your case was successfully submitted. Please note your Case ID: 5310702226 for future reference. [14:47:30] 06Operations, 10ops-eqiad: ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed - https://phabricator.wikimedia.org/T139767#2498917 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Received the disk and swapped it out w/old one. @fgiunchedi [14:48:10] 06Operations, 06Commons: Please fix broken thumbnails - https://phabricator.wikimedia.org/T140536#2498921 (10Raymond) [14:49:44] RECOVERY - cassandra-a CQL 10.192.48.46:9042 on restbase2005 is OK: TCP OK - 0.036 second response time on port 9042 [14:50:33] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2005-b.codfw.wmnet) [14:50:34] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [14:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:50:46] !deploy [14:50:52] !next [14:50:59] what was the command? [14:51:33] I can never remember [14:51:34] @next [14:51:43] I think I have 10 minutes to hack a config there [14:51:55] RECOVERY - puppet last run on db2003 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:51:59] jouncebot: next [14:51:59] In 0 hour(s) and 8 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T1500) [14:52:06] Reedy: jynus: ^ [14:52:23] thanks, MatmaRex [14:53:24] (03PS2) 10Muehlenhoff: Remove old trusty scalers from conftool-data and dsh [puppet] - 10https://gerrit.wikimedia.org/r/301138 (https://phabricator.wikimedia.org/T141352) [14:53:56] (03PS1) 10Jcrespo: Depool db1055 for database maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301377 (https://phabricator.wikimedia.org/T140650) [14:54:15] it's been ages since I depooled a server, not sure if a good or bad sign [14:54:46] (03CR) 10Jcrespo: [C: 032] Depool db1055 for database maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301377 (https://phabricator.wikimedia.org/T140650) (owner: 10Jcrespo) [14:54:52] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove old trusty scalers from conftool-data and dsh [puppet] - 10https://gerrit.wikimedia.org/r/301138 (https://phabricator.wikimedia.org/T141352) (owner: 10Muehlenhoff) [14:55:00] is it CI faster or it is just me? [14:55:16] 06Operations, 10MediaWiki-API, 10MediaWiki-Categories: API query allcategories finds ghost categories - https://phabricator.wikimedia.org/T141443#2498937 (10doctaxon) [14:55:36] PROBLEM - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: Connection refused [14:55:40] 06Operations, 10ops-eqiad, 10netops: Replace cr1/2-eqiad PSUs/fantrays with high-capacity ones - https://phabricator.wikimedia.org/T140765#2498949 (10Cmjohnson) @faidon yes, I did not swap cr1's fan tray but will swap out w/a new one. @mark no, we did not get new air filters [14:55:49] hey jynus :) [14:56:04] I was also getting that SQL query error on a different page [14:56:20] hey, Bsadowski1 we are going to put db1055 on the garage for some fixes [14:56:22] <-- Posted a reply a day or so ago [14:56:24] Ah. [14:56:34] hopefuly we will not break it more that it is now [14:56:47] that particular error should not happen while it is down [14:56:55] (03PS1) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [14:56:59] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: Connection refused eevans Slow startup (T141110) - The acknowledgement expires at: 2016-07-27 15:56:33. [14:57:08] but I intend to make the fix permanent rather than temporary [14:57:29] 06Operations, 10ops-eqiad, 10netops: Replace cr1/2-eqiad PSUs/fantrays with high-capacity ones - https://phabricator.wikimedia.org/T140765#2498953 (10mark) Probably worth buying some new ones now, and perhaps some new REs... [14:57:35] Bsadowski1, I saw your comments on the ticket, however, this is a very specific fix for a very specific problem [14:57:56] you will have to report other problems separately [14:58:00] (03PS2) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [14:58:09] although I would recommend you to wait until seeing what this does [14:58:53] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1055 for database maintenance (duration: 00m 29s) [14:58:54] 06Operations, 10Analytics, 06Performance-Team, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2498955 (10Nuria) >For experiments on "readers", we need to think carefully about how to minimize this problem and talk to other organizations > that do AB testing without a use... [14:58:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:59:01] the summary is that for that particular query, db1055 was choosing the wrong index [14:59:19] I am going to try to avoid that without changing code [14:59:21] andrewbogott, cscott in use .. will update the page. [15:00:05] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T1500). [15:00:05] Addshore, ebernhardson, MatmaRex, and mafk: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:10] *waves* [15:00:14] \o [15:00:30] Yes [15:00:45] RECOVERY - dhclient process on ganeti1004 is OK: PROCS OK: 0 processes with command name dhclient [15:00:59] hi [15:01:12] I can SWAT today [15:01:16] andrewbogott, i am on labs-l ... but, i wasn't paying attention to emails on the list. let me take a look. [15:01:24] RECOVERY - DPKG on ganeti1004 is OK: All packages OK [15:02:13] (03PS3) 10Thcipriani: beta wgEchoMentionStatusNotifications default true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301098 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [15:02:30] RECOVERY - configured eth on ganeti1004 is OK: OK - interfaces up [15:02:37] thcipriani: I was going to say you could leave my 2 for the end (and I could do them?) [15:02:52] I did a wrong commit, tell me when I can deploy to fix it [15:02:59] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301098 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [15:03:26] (03Merged) 10jenkins-bot: beta wgEchoMentionStatusNotifications default true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301098 (https://phabricator.wikimedia.org/T140234) (owner: 10Addshore) [15:03:35] addshore: oh, yeah, sure, ^ I'll get that one done since I merged it before I looked at IRC :\ [15:03:43] sounds good! :) [15:03:46] 06Operations, 10Traffic: Age header reset to 0 after 24 hours on varnish frontends - https://phabricator.wikimedia.org/T141373#2498969 (10ema) https://phabricator.wikimedia.org/P3585 is a VTC test case with two varnishes targeting varnish 3 and default VCL except for a 5s TTL cap on the varnish frontend. The t... [15:04:19] (03PS1) 10Jcrespo: Depool db1055 also from the rc/log role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301384 (https://phabricator.wikimedia.org/T140650) [15:04:53] thcipriani: I think jynus wants to deploy an urgent fix-patch? [15:05:00] it is not urgent [15:05:09] it is a partial depool, it can wait [15:05:17] it will not cause any issues [15:05:28] oki [15:05:40] PROBLEM - Host ganeti1004 is DOWN: PING CRITICAL - Packet loss = 100% [15:05:41] jynus: kk, if you want to jump in, let me know :) [15:05:55] if it was urgent, I would have stopped the train [15:06:02] you have right of way here [15:06:10] RECOVERY - Host ganeti1004 is UP: PING OK - Packet loss = 0%, RTA = 2.12 ms [15:06:13] my fault, I am the one that waits [15:06:49] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:301098|beta wgEchoMentionStatusNotifications default true]] (duration: 01m 28s) [15:06:49] RECOVERY - Disk space on ganeti1004 is OK: DISK OK [15:06:53] I give way to the Hon. Gentlemen Mr. Speaker :) [15:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:07:01] RECOVERY - MD RAID on ganeti1004 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [15:07:32] umm. There are lots of host-key verification failures in scap mw102[0-5] + mw101{8,9} is this a known/expected thing? [15:07:46] so mw1018-mw1025 [15:07:53] that didn't happen to me [15:08:09] oh, it did [15:08:13] I didn't notice it [15:08:21] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:08:31] let me check them [15:08:36] ah, 1018 and 1019 have no route to host, not host key errors :( [15:08:47] thanks [15:08:52] thcipriani: I just decommed them :( [15:09:10] we need a dsh update them, elukey ? [15:09:20] *then [15:09:52] I think it was already done by my puppet change [15:09:57] 06Operations, 10MediaWiki-API, 10MediaWiki-Categories: API query allcategories finds ghost categories - https://phabricator.wikimedia.org/T141443#2498991 (10doctaxon) and finds categories like Category:16th century in England , which is enwiki but not dewiki [15:09:59] ah probably it didn't run on tin [15:10:03] let me see [15:10:09] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 2 failures [15:10:10] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Puppet has 1 failures [15:10:30] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: puppet fail [15:10:35] 18-25, right? [15:10:41] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: Puppet has 1 failures [15:10:52] yup, 20-25 have unknown host keys [15:10:58] jynus: yeah, but I've only shutdown 18/19 [15:10:59] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [15:11:16] but the others are supposed to be active? [15:11:20] PROBLEM - puppet last run on mw2066 is CRITICAL: CRITICAL: Puppet has 1 failures [15:11:22] (03PS1) 10Muehlenhoff: Remove old trusty image scalers from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/301386 [15:11:29] PROBLEM - puppet last run on mw2156 is CRITICAL: CRITICAL: Puppet has 1 failures [15:11:30] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet has 1 failures [15:11:49] I can accept these hostkeys, it doesn't look like they conflict with any known_host keys, but I thought that was taken care of in prod? [15:11:51] jynus: nope they have been removed by puppet with my last puppet merge [15:12:00] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [15:13:23] first of all, lets run puppet on tin, see if it has penfing changes [15:13:30] already done [15:13:30] RECOVERY - puppetmaster backend https on rhodium is OK: HTTP OK: Status line output matched 400 - 333 bytes in 1.920 second response time [15:14:51] my change was in https://gerrit.wikimedia.org/r/#/c/301342 [15:16:27] cool, so seems like once puppet runs on tin should be good :) [15:16:28] 06Operations, 10ops-eqiad: ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed - https://phabricator.wikimedia.org/T139767#2499003 (10fgiunchedi) 05Resolved>03Open a:05Cmjohnson>03fgiunchedi thanks @Cmjohnson ! reopening and assigning to me. I made a mistake by removing the LD because now the others will... [15:17:07] (03PS2) 10Thcipriani: Turn on textcat based language detection for search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301369 (owner: 10EBernhardson) [15:18:16] (03PS1) 10Muehlenhoff: Remove old trusty scalers from installer config [puppet] - 10https://gerrit.wikimedia.org/r/301388 [15:18:39] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301369 (owner: 10EBernhardson) [15:18:54] (03CR) 10Muehlenhoff: [C: 032] Remove old trusty image scalers from site.pp [puppet] - 10https://gerrit.wikimedia.org/r/301386 (owner: 10Muehlenhoff) [15:19:04] (03Merged) 10jenkins-bot: Turn on textcat based language detection for search [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301369 (owner: 10EBernhardson) [15:19:37] ebernhardson: on your change, fine to sync InitialiseSettings.php first then CirrusSearch-common? That shouldn't explode anything, right? [15:19:43] thcipriani: lemme double check [15:19:48] thanks [15:20:16] 06Operations, 10MediaWiki-API, 10MediaWiki-Categories: API query allcategories finds ghost categories - https://phabricator.wikimedia.org/T141443#2499024 (10Anomie) [15:20:21] RECOVERY - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is OK: TCP OK - 0.036 second response time on port 9042 [15:20:26] thcipriani: InitialiseSettings.php can go first [15:20:38] is everthing good now? I got distracted with another task? [15:21:04] jynus: it looks like it, those hosts are gone from the dsh files, thanks! [15:21:20] ebernhardson: ack, okie doke [15:21:56] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2006-a.codfw.wmnet) [15:21:57] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [15:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:22:16] (03CR) 10Muehlenhoff: [C: 032] Remove old trusty scalers from installer config [puppet] - 10https://gerrit.wikimedia.org/r/301388 (owner: 10Muehlenhoff) [15:22:16] ebernhardson: change has been pulled on mw1099. Possible to check anything there before rollout? [15:22:30] thcipriani: i can check if it works, yea [15:22:31] sec [15:23:57] thcipriani: oh, it's only InitialiseSettings.php and not the other? can't test it yet [15:24:36] hmm, that should be everything on mw1099 [15:24:41] otherwise, if both are pulled then it's not working for some reason :S sec [15:24:58] * thcipriani double checks [15:25:40] RECOVERY - NTP on ganeti1004 is OK: NTP OK: Offset -0.00679910183 secs [15:25:47] looks to all be there [15:25:51] PROBLEM - mediawiki-installation DSH group on mw1156 is CRITICAL: Host mw1156 is not in mediawiki-installation dsh group [15:25:51] PROBLEM - mediawiki-installation DSH group on mw1155 is CRITICAL: Host mw1155 is not in mediawiki-installation dsh group [15:26:26] thcipriani: hmm, mwrepl from mw1099 for enwiki shows $wgCirrusSearchEnableAltLanguage is false, but shuld be true [15:26:31] PROBLEM - cassandra-a CQL 10.192.48.49:9042 on restbase2006 is CRITICAL: Connection refused [15:26:35] afaict from the patch that should hold [15:27:01] perhaps sync order was iffy, and it needs a touch on InitialiseSettings.php? [15:27:37] yeah, there wasn't really a sync-order, I just do a scap pull on mw1099, I can try that again. [15:27:37] 06Operations, 10MediaWiki-API, 10MediaWiki-Categories: API query allcategories finds ghost categories - https://phabricator.wikimedia.org/T141443#2498937 (10Anomie) Prior to 1.28.0-wmf.12, the definition of a category used by list=allcategories is "has ever had an entry". So, for example, if someone imported... [15:28:42] thcipriani: sorry I was in a meeting, the patch that I've merged put mw1099 as explicit canary with mw1017, because they are test appservers.. Not sure if I needed to to something more, apologies in case :( [15:29:25] elukey: nah, patch seems fine now, it just hadn't run on tin so scap through some errors about not being able to connect, etc. [15:29:34] super [15:29:51] ACKNOWLEDGEMENT - cassandra-a CQL 10.192.48.49:9042 on restbase2006 is CRITICAL: Connection refused eevans Slow startup (T141110) - The acknowledgement expires at: 2016-07-27 16:29:26. [15:29:55] just FYI, I am in the process of decom appservers and api servers between today/tomorrow [15:30:03] https://etherpad.wikimedia.org/p/appservers-decom [15:30:49] PROBLEM - mediawiki-installation DSH group on mw1153 is CRITICAL: Host mw1153 is not in mediawiki-installation dsh group [15:30:56] this is weird [15:31:13] thcipriani: found the problem, wmfCirrusSearchEnableAltLanguage is in InitialiseSettings.php twice, sec i'll make a patch to fix [15:31:23] ebernhardson: nice, thank you. [15:31:31] hooray mw1099 found a problem :) [15:31:35] ish [15:32:40] :-) [15:32:46] (03PS1) 10EBernhardson: Remove second copy of wmfCirrusSearchEnableAltLanguage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 [15:32:48] (03PS3) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [15:33:06] (03PS2) 10EBernhardson: Remove second copy of wmgCirrusSearchEnableAltLanguage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 [15:33:10] thcipriani: ^ should do the trick [15:33:27] ah image scalers! [15:33:55] (03PS3) 10EBernhardson: Remove second copy of wmgCirrusSearchEnableAltLanguage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 [15:33:59] (03PS4) 10Thcipriani: Remove second copy of wmgCirrusSearchEnableAltLanguage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 (owner: 10EBernhardson) [15:34:04] i guess we both rebased it :) [15:34:10] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 (owner: 10EBernhardson) [15:34:24] so about mw1156, mw1155 and mw1153 - moritzm is decomissioning them [15:34:36] not sure if anything was logged before [15:34:37] (03Merged) 10jenkins-bot: Remove second copy of wmgCirrusSearchEnableAltLanguage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301389 (owner: 10EBernhardson) [15:34:43] all good then [15:35:02] thcipriani: I am going to shutdown mw102[0-5] now [15:35:04] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [15:35:09] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2499077 (10GWicke) [15:35:12] (03PS1) 10Eevans: configurable trickle_fsync [puppet] - 10https://gerrit.wikimedia.org/r/301390 (https://phabricator.wikimedia.org/T140825) [15:35:16] RECOVERY - puppet last run on mw2156 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:35:21] ebernhardson: now check on mw1099 please [15:35:44] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [15:35:45] thcipriani: success! [15:35:54] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [15:36:05] RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [15:36:09] elukey: they should be gone from icinga now, puppet run on neon took rather long to complete [15:36:16] ebernhardson: nice, OK, I'll move forward with InitialiseSettings.php then CirrusSearch-common.php [15:36:25] thcipriani: yup [15:36:34] moritzm: sure sorry for the ping! [15:36:54] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:36:54] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:37:15] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:37:23] (03PS2) 10Eevans: configurable trickle_fsync [puppet] - 10https://gerrit.wikimedia.org/r/301390 (https://phabricator.wikimedia.org/T140825) [15:37:45] RECOVERY - puppet last run on mw2066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:37:49] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:301369|Turn on textcat based language detection for search]] PART I (duration: 00m 27s) [15:37:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:19] !log thcipriani@tin Synchronized wmf-config/CirrusSearch-common.php: SWAT: [[gerrit:301369|Turn on textcat based language detection for search]] PART II (duration: 00m 23s) [15:38:23] ^ ebernhardson check live please [15:38:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:56] thcipriani: also looks to be working, thanks! [15:39:04] ebernhardson: awesome, thanks for checking :) [15:39:30] (03PS2) 10Thcipriani: Add 'upwizcampeditors' to $wgAddGroups, $wgRemoveGroups for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301371 (https://phabricator.wikimedia.org/T141431) (owner: 10Bartosz Dziewoński) [15:39:49] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301371 (https://phabricator.wikimedia.org/T141431) (owner: 10Bartosz Dziewoński) [15:39:52] (03PS4) 10Ppchelko: Change-Prop: Updates to error handling [puppet] - 10https://gerrit.wikimedia.org/r/300681 [15:40:14] (03Merged) 10jenkins-bot: Add 'upwizcampeditors' to $wgAddGroups, $wgRemoveGroups for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301371 (https://phabricator.wikimedia.org/T141431) (owner: 10Bartosz Dziewoński) [15:41:51] MatmaRex: your change is on mw1099 check please [15:42:09] 06Operations, 10ops-eqiad: ms-be1021.eqiad.wmnet: slot=1I:1:2 dev=sdh failed - https://phabricator.wikimedia.org/T139767#2499086 (10Cmjohnson) Return part UPS tracking numbers picture attached{F4312382} [15:43:08] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed disks on ms-be1027 - https://phabricator.wikimedia.org/T140374#2499087 (10Cmjohnson) Return tracking info for wrong disks tracking info. {F4312390} [15:43:24] thcipriani: ugh, can you just deploy it? i'd have to make a test account on production commons with sysop rights (and no ohter rights) [15:43:38] which probably breaks like five wmf policies all at once [15:43:39] elukey: heh, you beat me to it [15:44:07] oh [15:44:25] thcipriani: if you're not comfortable with that then please revert :/ it's hopefully not a huge issue for commons [15:44:26] MatmaRex: need sysops at commons? I can grant 9_9 [15:44:57] (03CR) 10Eevans: [C: 031] "Puppet compiler output here: http://puppet-compiler.wmflabs.org/3505/" [puppet] - 10https://gerrit.wikimedia.org/r/301390 (https://phabricator.wikimedia.org/T140825) (owner: 10Eevans) [15:44:58] mafk: hmm. are you a sysop there maybe? that'd be easiest [15:45:05] PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [15:45:07] MatmaRex: nope, just steward [15:45:26] MatmaRex: deploying, change seems innocuous [15:45:32] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:301371|Add "upwizcampeditors" to $wgAddGroups, $wgRemoveGroups for commonswiki]] (duration: 00m 24s) [15:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:45:37] ^ MatmaRex check please [15:45:43] but I think Bsadowski1 is? [15:45:47] i'd need a sysop to check on https://commons.wikimedia.org/wiki/Special:UserRights/Matma_Rex that they can grant "campaign editor" privileges [15:46:02] (don't grant them, just see that you have the checkbox and it's enabled) [15:46:10] I resigned my sysops there some years ago :( [15:46:11] I'm not a local crat there [15:46:15] or sysop [15:46:16] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353#2499092 (10elukey) [15:46:31] Dereckson is IIRC? [15:46:42] Uh, I don't know who might be on that can test it on Commons. [15:46:58] we just need a sysop [15:47:11] thcipriani: eh. i'll just follow up on https://phabricator.wikimedia.org/T141431 ? nothing seems to be broken [15:47:12] Hey mafk :) [15:47:21] hey Bsadowski1 [15:47:23] MatmaRex: ack. [15:47:29] ok. thanks :) [15:47:49] in theory I can grant myself sysop rights and check, but I'll behead myself before they do [15:48:25] RECOVERY - cassandra-a CQL 10.192.48.49:9042 on restbase2006 is OK: TCP OK - 0.036 second response time on port 9042 [15:48:35] (03PS2) 10GWicke: Service::node: Capture stdout and stderr in journal [puppet] - 10https://gerrit.wikimedia.org/r/301309 (https://phabricator.wikimedia.org/T136957) [15:48:35] MatmaRex: Add groups: Rollbackers, Confirmed users, Patrollers, Autopatrollers, File movers, Image reviewers, Upload Wizard campaign editors and IP block exemptions [15:48:37] Remove groups: Rollbackers, Confirmed users, Patrollers, Autopatrollers, File movers, Image reviewers, Upload Wizard campaign editors and IP block exemptions [15:48:38] Add group to own account: Translation administrators [15:48:40] Remove group from own account: Translation administrators [15:48:41] it's working [15:48:44] (03CR) 10Elukey: [C: 031] "Nop for PCC https://puppet-compiler.wmflabs.org/3505/" [puppet] - 10https://gerrit.wikimedia.org/r/301390 (https://phabricator.wikimedia.org/T140825) (owner: 10Eevans) [15:48:52] (03PS3) 10Thcipriani: IP cap lift for Wikipedia Edit-a-thon on 2016-08-03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301372 (https://phabricator.wikimedia.org/T141421) (owner: 10MarcoAurelio) [15:48:53] at least from Special:ListGroupRights perspective [15:49:01] https://commons.wikimedia.org/wiki/Special:ListGroupRights ? [15:49:01] mafk: oh, right, there's ListGroupRights. okay [15:49:08] i forgot about it. okay [15:49:16] we all did for a while :) [15:49:17] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2006-b.codfw.wmnet) [15:49:18] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [15:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:49:30] urandom: shall we merge? 301390 [15:49:49] not sure if you are planning tests with that option [15:49:53] but it sounds interesting [15:49:55] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301372 (https://phabricator.wikimedia.org/T141421) (owner: 10MarcoAurelio) [15:50:01] elukey: nope, it's safe to merge [15:50:17] elukey: i'm going to follow it up with something that uses it (for the restbase cluster) [15:50:21] (03Merged) 10jenkins-bot: IP cap lift for Wikipedia Edit-a-thon on 2016-08-03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301372 (https://phabricator.wikimedia.org/T141421) (owner: 10MarcoAurelio) [15:50:34] urandom: all right merging [15:50:50] elukey: i got a meeting here in a sec, but we can also chat about any value this might have for you [15:51:29] elukey: though if you're using LCS, any values you use would be different than ours, which is one reason i'm starting with this changeset [15:51:30] (03CR) 10Elukey: [C: 032] configurable trickle_fsync [puppet] - 10https://gerrit.wikimedia.org/r/301390 (https://phabricator.wikimedia.org/T140825) (owner: 10Eevans) [15:52:04] mafk: throttle is live on mw1099, no catastrophic errors there, anything to check before rolling everywhere? [15:52:19] thcipriani: it's un-checkable [15:52:42] any errors anywhere? [15:52:55] nope, all looks good. That's all I could think to check :) [15:53:11] ack, I think we can deploy [15:53:20] urandom: sure! Just merged, going to run puppet on a couple of restbase hosts just to be sure [15:53:34] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [15:53:52] elukey: kk [15:54:18] !log thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:301372|IP cap lift for Wikipedia Edit-a-thon on 2016-08-03]] (duration: 00m 23s) [15:54:23] thcipriani: I think I am up then? [15:54:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:54:25] PROBLEM - cassandra-b CQL 10.192.48.50:9042 on restbase2006 is CRITICAL: Connection refused [15:54:28] ^ mafk sync'd [15:54:34] addshore: yup, you're up. [15:54:39] lemme know if you need anything. [15:54:44] ktnks thcipriani :D [15:55:08] thcipriani: one question, whats the policy on skipping CI for things like https://gerrit.wikimedia.org/r/#/c/301356/ which the CI would take some time on? / its already run once.... [15:55:14] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.48.50:9042 on restbase2006 is CRITICAL: Connection refused eevans Slow startup (T141110) - The acknowledgement expires at: 2016-07-27 16:54:42. [15:55:54] addshore: yeah, but force-merge clogs up CI so I don't do it, I just +2 and let CI run. [15:56:03] okay! :) [15:56:17] *twiddles thumbs* [15:56:32] also: there's no rush, there aren't any other deploy slots until train time :) [15:59:30] (03PS1) 10Kaldari: Test numeric collation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301394 (https://phabricator.wikimedia.org/T141433) [16:04:21] !log addshore@tin Synchronized php-1.28.0-wmf.11/includes/EditPage.php: {{gerrit|301356}} Count edit conflicts for each namespace separately (duration: 00m 32s) [16:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:04:36] thcipriani: that one went fine, now for the last one! [16:05:04] :D [16:05:42] I can't wait for this european time swat window to appear ;) [16:06:53] bah, CI failed on the second one because the CI mysql server vanished... *re runs it* [16:07:30] +1 addshore re European SWAT :) [16:07:50] PROBLEM - Host bellatrix is DOWN: PING CRITICAL - Packet loss = 100% [16:08:40] PROBLEM - Disk space on phab2001 is CRITICAL: DISK CRITICAL - /var/spool/exim4/db is not accessible: Permission denied [16:09:20] oops bellatrix is me, reboot took longer than the scheduled downtime [16:10:19] RECOVERY - Host bellatrix is UP: PING OK - Packet loss = 0%, RTA = 37.24 ms [16:10:59] RECOVERY - Disk space on phab2001 is OK: DISK OK [16:11:08] mhhm thcipriani CI mysql keeps failing for this one :P https://integration.wikimedia.org/ci/job/mediawiki-extensions-php55/5985/ https://gerrit.wikimedia.org/r/#/c/301387/1 Guess I'll just keep trying... [16:11:13] (03CR) 10Thcipriani: [C: 031] "I've definitely run into this during SWAT a couple times, would be nice to have :)" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [16:11:18] (03CR) 1020after4: [C: 031] "definitely a good idea" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [16:13:00] (03PS4) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [16:14:00] addshore: looks like mysql just fell over on that box, they talked about it in -releng a second ago. Give it another try, sorry :( [16:14:11] 3rd time lucky! [16:15:24] looking good this time! [16:15:50] (03CR) 10Ema: [C: 031] "LGTM and to pcc" [puppet] - 10https://gerrit.wikimedia.org/r/301373 (https://phabricator.wikimedia.org/T131501) (owner: 10BBlack) [16:19:01] mafk: thanks for the ping [16:19:01] MatmaRex: works: (change visibility) 2016-07-27T16:18:48 Dereckson (talk | contribs | block) changed group membership for Matma Rex from patroller to patroller and Upload Wizard campaign editor (Test for a current issue (see phab:T141431)) [16:19:02] T141431: Can't grant Upload Wizard campaign editor on commons - https://phabricator.wikimedia.org/T141431 [16:19:36] thanks [16:19:40] you're welcome [16:20:04] Dereckson: no problem :) [16:20:38] Steinsplitter: so, the issue you reported sooner (sysop add of Upload Wizard campaign editor right) seems solved [16:21:39] RECOVERY - cassandra-b CQL 10.192.48.50:9042 on restbase2006 is OK: TCP OK - 0.041 second response time on port 9042 [16:22:10] PROBLEM - puppet last run on db2057 is CRITICAL: CRITICAL: puppet fail [16:22:22] !log addshore@tin Synchronized php-1.28.0-wmf.12/extensions/MobileFrontend/includes/skins/SkinMinerva.php: {{gerrit|301387}} Fix watchstar for logged-out user (duration: 00m 32s) [16:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:22:44] thcipriani: jynus that is me (and swat) all done! [16:22:53] addshore, thank you [16:22:58] I will deploy soon [16:29:37] Dreckson: Yepp :-) *happy* [16:29:43] thanks to MatmaRex :)):) [16:33:33] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2009-a.codfw.wmnet) [16:33:34] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [16:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:35:02] (03CR) 10Bmansurov: [C: 031] Change default gallery mode to 'packed' on the English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester) [16:39:12] (03CR) 10Hashar: "recheck" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/301192 (owner: 10Hashar) [16:39:51] PROBLEM - cassandra-a CQL 10.192.48.54:9042 on restbase2009 is CRITICAL: Connection refused [16:40:39] (03PS3) 10Andrew Bogott: Set up spice-based remote consoles for Labs instances [puppet] - 10https://gerrit.wikimedia.org/r/301294 (https://phabricator.wikimedia.org/T141399) [16:42:31] ACKNOWLEDGEMENT - cassandra-a CQL 10.192.48.54:9042 on restbase2009 is CRITICAL: Connection refused eevans Delayed startup (T141110) - The acknowledgement expires at: 2016-07-27 17:41:50. [16:49:40] RECOVERY - puppet last run on db2057 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [16:52:36] (03PS5) 10Giuseppe Lavagetto: Change-Prop: Updates to error handling [puppet] - 10https://gerrit.wikimedia.org/r/300681 (owner: 10Ppchelko) [16:53:19] (03PS1) 10Andrew Bogott: Make sure that nova-network services are only running on the active host. [puppet] - 10https://gerrit.wikimedia.org/r/301401 [16:53:21] (03PS1) 10Andrew Bogott: Install nova-network packages on labnet1001. [puppet] - 10https://gerrit.wikimedia.org/r/301402 [16:54:12] (03CR) 10Giuseppe Lavagetto: [C: 032] Change-Prop: Updates to error handling [puppet] - 10https://gerrit.wikimedia.org/r/300681 (owner: 10Ppchelko) [16:57:46] 06Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10Traffic: piuparts fail with WARN: Broken symlinks: /etc/systemd/system... - https://phabricator.wikimedia.org/T141454#2499359 (10hashar) [16:57:52] (03PS1) 10BryanDavis: firejail: imagemagick rules must depend on package [puppet] - 10https://gerrit.wikimedia.org/r/301403 [16:57:54] (03PS1) 10BryanDavis: mediawiki::php: /etc/php5/apache2 provided by libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/301404 [16:57:56] (03PS1) 10BryanDavis: scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 [16:57:56] 06Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10Traffic: piuparts fail with WARN: Broken symlinks: /etc/systemd/system... - https://phabricator.wikimedia.org/T141454#2499372 (10hashar) p:05Triage>03Low [17:02:10] (03CR) 10jenkins-bot: [V: 04-1] scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 (owner: 10BryanDavis) [17:04:40] (03PS2) 10BryanDavis: scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 [17:05:34] 06Operations, 10Analytics, 06Performance-Team, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2499403 (10Milimetric) >>! In T135762#2497291, @BBlack wrote: >>>! In T135762#2497082, @ellery wrote: >> As far as I can tell, the proposed method also violates the more importa... [17:06:01] (03CR) 10jenkins-bot: [V: 04-1] scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 (owner: 10BryanDavis) [17:06:16] ffs jerkins [17:06:59] (03PS3) 10BryanDavis: scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 [17:08:39] (03PS2) 10Jcrespo: Depool db1055 also from the rc/log role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301384 (https://phabricator.wikimedia.org/T140650) [17:09:34] (03CR) 10Jcrespo: [C: 032] Depool db1055 also from the rc/log role [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301384 (https://phabricator.wikimedia.org/T140650) (owner: 10Jcrespo) [17:11:43] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 210, down: 1, dormant: 0, excluded: 1, unused: 0BRxe-5/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [17:14:03] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 26 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [17:14:47] (03CR) 10Chad: [C: 031] firejail: imagemagick rules must depend on package [puppet] - 10https://gerrit.wikimedia.org/r/301403 (owner: 10BryanDavis) [17:16:26] (03CR) 10Thcipriani: [C: 031] "This has been driving me crazy!" [puppet] - 10https://gerrit.wikimedia.org/r/301405 (owner: 10BryanDavis) [17:16:53] (03CR) 10Chad: [C: 031] Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) (owner: 10Paladox) [17:17:04] (03CR) 10Chad: [C: 031] mediawiki::php: /etc/php5/apache2 provided by libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/301404 (owner: 10BryanDavis) [17:20:13] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 17 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [17:21:10] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1055 also from the rc/log role (duration: 00m 28s) [17:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:22:23] RECOVERY - cassandra-a CQL 10.192.48.54:9042 on restbase2009 is OK: TCP OK - 0.033 second response time on port 9042 [17:22:42] !log Truncating "local_group_wikipedia_T_parsoid_section_offsets".data, "local_group_wikipedia_T_parsoid_dataW4ULtxs1oMqJ".data, and "local_group_wikipedia_T_parsoid_html".data in RESTBase staging [17:22:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:23:21] !log running analyze table on enwiki.logging db1055 (depooled) [17:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:27:03] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: puppet fail [17:31:21] (03PS1) 10Muehlenhoff: Remove DNS entries for old trusty scalers [dns] - 10https://gerrit.wikimedia.org/r/301406 (https://phabricator.wikimedia.org/T141352) [17:31:59] (03PS1) 10Chad: Gerrit: One less line for exec{}. Cool! [puppet] - 10https://gerrit.wikimedia.org/r/301407 [17:32:33] 06Operations, 10hardware-requests: Decomission mw1153-mw1160 - https://phabricator.wikimedia.org/T141352#2499480 (10MoritzMuehlenhoff) Systems are removed from puppet, salt, dsh, conftool, Icinga and shutdown. [17:33:32] (03Abandoned) 10Chad: Gerrit: One less line for exec{}. Cool! [puppet] - 10https://gerrit.wikimedia.org/r/301407 (owner: 10Chad) [17:37:35] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2499499 (10GWicke) Discussion notes from today's planning meeting are available at... [17:37:42] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Puppet has 3 failures [17:38:13] PROBLEM - salt-minion processes on rhodium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [17:39:13] PROBLEM - DPKG on rhodium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:39:17] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2499531 (10mark) p:05High>03Normal [17:40:01] (03PS5) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [17:40:09] (03PS6) 10Paladox: Fix gitweb.file link for diffusion [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) [17:41:20] jouncebot: next [17:41:21] In 1 hour(s) and 18 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T1900) [17:41:23] RECOVERY - DPKG on rhodium is OK: All packages OK [17:41:54] (03CR) 10Dzahn: [C: 032] "yep, confirmed there is a 404 when clicking on those links in gitweb" [puppet] - 10https://gerrit.wikimedia.org/r/301381 (https://phabricator.wikimedia.org/T141420) (owner: 10Paladox) [17:42:08] mutante ^^ thanks [17:42:09] :) [17:44:07] mutante ^^ i doint belive it started jenkins, please could you re c+2 again please? [17:44:23] never mind [17:44:25] merged now [17:44:39] thanks [17:44:40] :) [17:45:19] !log gerrit is restarting to deploy config change 301381, a couple seconds downtime [17:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:45:41] Yay the new gerrit maitenance screen shows [17:45:48] yep:) [17:45:54] no more ugly error message [17:45:58] Yep [17:46:01] Much better [17:46:02] :) [17:46:03] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:46:11] done [17:46:18] paladox: please confirm [17:46:22] Yay works now [17:46:25] great [17:46:34] thankyou [17:46:44] np,thanks for fix [17:46:50] Your welcome :) [17:48:33] PROBLEM - restbase endpoints health on cerium is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.147, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [17:49:34] PROBLEM - Restbase root url on cerium is CRITICAL: Connection refused [17:49:39] that's a test host [17:50:34] RECOVERY - restbase endpoints health on cerium is OK: All endpoints are healthy [17:50:54] /en.wikipedia.org/v1/?spec ? [17:51:42] RECOVERY - Restbase root url on cerium is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.015 second response time [17:51:56] !log restarted grrriit-wm [17:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:54:02] PROBLEM - cassandra-a CQL 10.64.16.153:9042 on cerium is CRITICAL: Connection refused [17:54:23] PROBLEM - cassandra-a service on cerium is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [17:55:48] (03CR) 10BryanDavis: [C: 04-1] mediawiki::php: /etc/php5/apache2 provided by libapache2-mod-php5 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301404 (owner: 10BryanDavis) [17:57:12] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:58:06] !log T134016: Restarting Cassandra instance to apply disabled streaming socket timeout (restbase2009-b.codfw.wmnet) [17:58:08] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [17:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:01:44] (03PS2) 10BryanDavis: mediawiki::php: /etc/php5/apache2 provided by php5-dbg [puppet] - 10https://gerrit.wikimedia.org/r/301404 [18:01:46] (03PS4) 10BryanDavis: scap::l10nupdate: Fix ~l10nupdate provisioning in Labs [puppet] - 10https://gerrit.wikimedia.org/r/301405 [18:01:48] (03PS2) 10BryanDavis: scap: Allow configuration of the master rsync server in wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/301408 [18:02:31] (03PS2) 10Chad: Gerrit: Remove old library linking [puppet] - 10https://gerrit.wikimedia.org/r/300932 [18:02:53] RECOVERY - cassandra-a service on cerium is OK: OK - cassandra-a is active [18:04:14] PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [18:04:22] PROBLEM - cassandra-b CQL 10.192.48.55:9042 on restbase2009 is CRITICAL: Connection refused [18:04:33] RECOVERY - cassandra-a CQL 10.64.16.153:9042 on cerium is OK: TCP OK - 0.001 second response time on port 9042 [18:04:42] (03PS1) 10Alexandros Kosiaris: akosiaris: Move my .htoprc to new location [puppet] - 10https://gerrit.wikimedia.org/r/301411 [18:05:27] (03PS1) 10Giuseppe Lavagetto: puppetmaster: pin all the needed packages [puppet] - 10https://gerrit.wikimedia.org/r/301412 [18:05:46] <_joe_> akosiaris: ^^ [18:06:46] (03CR) 10Alexandros Kosiaris: [C: 031] puppetmaster: pin all the needed packages [puppet] - 10https://gerrit.wikimedia.org/r/301412 (owner: 10Giuseppe Lavagetto) [18:06:55] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] akosiaris: Move my .htoprc to new location [puppet] - 10https://gerrit.wikimedia.org/r/301411 (owner: 10Alexandros Kosiaris) [18:08:02] !log deploy restbase 8efbc9282e to staging [18:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:08:52] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.48.55:9042 on restbase2009 is CRITICAL: Connection refused eevans Slow start (T141110) - The acknowledgement expires at: 2016-07-27 19:08:25. [18:09:52] RECOVERY - salt-minion processes on rhodium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:10:06] ACKNOWLEDGEMENT - cassandra-c service on restbase2005 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed eevans Down to coordinate bootstrap - The acknowledgement expires at: 2016-07-27 20:09:34. [18:10:43] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [18:14:01] (03CR) 10Thcipriani: [C: 031] scap: Allow configuration of the master rsync server in wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [18:14:27] (03PS1) 10Alexandros Kosiaris: puppetmaster: Conditionally set always_cache_features [puppet] - 10https://gerrit.wikimedia.org/r/301413 (https://phabricator.wikimedia.org/T98173) [18:15:01] (03PS1) 10Giuseppe Lavagetto: puppetmaster: fix master.conf on 3.7+ servers [puppet] - 10https://gerrit.wikimedia.org/r/301414 [18:15:05] <_joe_> oh man [18:15:09] hehehe [18:15:13] got ya :P [18:15:23] <_joe_> mine is more beautiful though [18:15:32] <_joe_> now I'm going to dinner, will be back later [18:15:40] yeah indeed [18:15:45] same here [18:15:46] <_joe_> if you want to merge either of the patches, be my guest [18:15:50] won't be back later though [18:16:03] <_joe_> eheh ok tomorrow then :P [18:17:10] (03PS2) 10Alexandros Kosiaris: puppetmaster: Conditionally set always_cache_features [puppet] - 10https://gerrit.wikimedia.org/r/301413 (https://phabricator.wikimedia.org/T98173) [18:18:38] (03CR) 10Chad: [C: 031] "Functionally ok. Nitpick inline that's probably me just caring too much." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [18:19:52] (03CR) 10Thcipriani: "Definitely seems like the right thing to do. Some comments inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:22:45] (03CR) 10Dzahn: [C: 04-1] "Good idea, yes, have been pinged about these myself a couple times. But i think you need to put a wrapper around the command for it to wor" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [18:25:23] (03CR) 10Chad: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:28:38] 06Operations, 10ops-eqiad: plug frqueue1001 into pfw1- ge-2/0/11 - https://phabricator.wikimedia.org/T141361#2499646 (10Jgreen) 05Open>03Resolved a:03Jgreen done: 2/0/11 -> frqueue1001 [18:29:10] (03CR) 10Thcipriani: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:29:58] (03PS2) 10Chad: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 [18:30:30] 06Operations, 10ops-eqiad, 10fundraising-tech-ops, 10netops: put pfw1- ge-2/0/11 in the 'fundraising' vlan for new host frqueue1001 - https://phabricator.wikimedia.org/T140991#2499657 (10Jgreen) 05Open>03Resolved a:03Jgreen [18:30:49] thcipriani: Moved outside and made the clone of scap/ (if it happens) notify the exec{} so it's applied after. [18:31:16] ostriches: nice :) [18:31:19] * thcipriani looks [18:31:20] Probably wanna run that on tin in puppet compiler to test though [18:31:24] I could be wrong :) [18:31:49] :D [18:31:53] * ostriches does that now [18:32:16] 06Operations, 10ops-eqiad, 10fundraising-tech-ops, 10netops: put pfw1- ge-2/0/11 in the 'fundraising' vlan for new host frqueue1001 - https://phabricator.wikimedia.org/T140991#2483556 (10Jgreen) [18:32:18] 06Operations, 10ops-eqiad: Survey available/unused ports on eqiad pfw's - https://phabricator.wikimedia.org/T141363#2499663 (10Jgreen) 05Open>03Resolved a:03Jgreen [18:32:59] (03CR) 10GWicke: "@Giuseppe: My interpretation of T136957 is that stdout/stderr did not make it into the journal. Do we have examples of std{out,err} actual" [puppet] - 10https://gerrit.wikimedia.org/r/301309 (https://phabricator.wikimedia.org/T136957) (owner: 10GWicke) [18:33:31] thcipriani: Success! https://puppet-compiler.wmflabs.org/3509/tin.eqiad.wmnet/ [18:33:43] RECOVERY - cassandra-b CQL 10.192.48.55:9042 on restbase2009 is OK: TCP OK - 0.034 second response time on port 9042 [18:34:50] This should be a no-op pretty much until we deploy the next one [18:34:51] (03PS5) 10Mforns: Add white-list for EventLogging auto-purging [puppet] - 10https://gerrit.wikimedia.org/r/298721 (https://phabricator.wikimedia.org/T108850) [18:35:05] (03CR) 10Thcipriani: [C: 031] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:35:12] Considering they all have a DEPLOY_HEAD already [18:35:38] yup, this will save that weird little step that I always almost forget about :) [18:42:35] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2499686 (10GWicke) [18:50:49] !log restart changeprop to apply config changes 300681 and 301305 [18:50:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:53:45] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2499725 (10GWicke) [18:54:29] (03PS2) 10BBlack: cache_misc: remove varnish3 VCL compat [puppet] - 10https://gerrit.wikimedia.org/r/301373 (https://phabricator.wikimedia.org/T131501) [18:54:36] (03CR) 10BBlack: [C: 032 V: 032] cache_misc: remove varnish3 VCL compat [puppet] - 10https://gerrit.wikimedia.org/r/301373 (https://phabricator.wikimedia.org/T131501) (owner: 10BBlack) [18:54:59] (03CR) 10BBlack: [C: 032] Revert "Revert "Revert "cache_misc: do not deliver expired cached objects""" [puppet] - 10https://gerrit.wikimedia.org/r/301374 (https://phabricator.wikimedia.org/T134989) (owner: 10BBlack) [18:55:03] (03PS2) 10BBlack: Revert "Revert "Revert "cache_misc: do not deliver expired cached objects""" [puppet] - 10https://gerrit.wikimedia.org/r/301374 (https://phabricator.wikimedia.org/T134989) [18:55:07] (03CR) 10BBlack: [V: 032] Revert "Revert "Revert "cache_misc: do not deliver expired cached objects""" [puppet] - 10https://gerrit.wikimedia.org/r/301374 (https://phabricator.wikimedia.org/T134989) (owner: 10BBlack) [18:59:23] RECOVERY - cassandra-c service on restbase2005 is OK: OK - cassandra-c is active [19:00:04] thcipriani: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T1900). [19:01:17] * thcipriani does [19:03:08] (03PS2) 10Andrew Bogott: Install nova-network packages on labnet1001. [puppet] - 10https://gerrit.wikimedia.org/r/301402 [19:03:10] (03PS2) 10Andrew Bogott: Make sure that nova-network services are only running on the active host. [puppet] - 10https://gerrit.wikimedia.org/r/301401 [19:04:19] !log T134016: Bootstrapping restbase2005-c.codfw.wmnet [19:04:21] T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016 [19:04:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:41] (03PS1) 10Thcipriani: group1 wikis to 1.28.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301423 [19:06:02] 06Operations, 10Traffic, 07HTTPS, 07Tracking: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#2499805 (10BBlack) [19:06:04] 06Operations, 10Traffic: Clean up DNS/redirects for TLS - https://phabricator.wikimedia.org/T102824#2499802 (10BBlack) 05Open>03Resolved a:03BBlack The title wording is unclear, which is perhaps why this ticket lingered open. However, from the subtasks it's clear this was about exception cases within ou... [19:06:23] (03CR) 10Andrew Bogott: [C: 032] "Puppet compiler confirms this a no-op." [puppet] - 10https://gerrit.wikimedia.org/r/301401 (owner: 10Andrew Bogott) [19:08:47] (03PS1) 10Eevans: [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 [19:09:32] (03PS2) 10Eevans: [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 [19:12:28] (03CR) 10jenkins-bot: [V: 04-1] [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 (owner: 10Eevans) [19:13:23] 06Operations, 10Traffic, 10Wikimedia-Shop, 07HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#2499855 (10BBlack) [19:13:26] (03Abandoned) 10Andrew Bogott: Install nova-network packages on labnet1001. [puppet] - 10https://gerrit.wikimedia.org/r/301402 (owner: 10Andrew Bogott) [19:13:48] 06Operations, 10Traffic, 10Wikimedia-Shop, 07HTTPS: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#2078914 (10BBlack) Fixed up title and description to match what we need out of it today, rather than old audit data and/or old targets. [19:14:53] Nikerabbit: around? jynus was asking which db server is producing the errors I described at https://phabricator.wikimedia.org/T141090 [19:15:44] is it wikishared? or one of the wikis? It makes more sense to me that it should be wikishared because it's a saving error, but I might be wrong [19:15:55] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2499882 (10BBlack) To re-iterate current status: the main thing missing here in the present is that the STS header is insufficient. It should be `strict-transport-security:max-age=3153... [19:16:02] does the cxsave action also write anything to the wiki? [19:16:33] !log rebuilding labnet1001 (it's a spare and shouldn't affect Labs) [19:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:16:56] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2499902 (10BBlack) [19:16:59] aharoni: no idea, sorry [19:16:59] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2499901 (10BBlack) [19:17:15] aharoni: you could try checking logstash for more details if any [19:18:07] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2201391 (10BBlack) Sub-tasks updated to be in sync with latest audit data. Primary issues here... [19:22:37] my bet would be on something like this https://gerrit.wikimedia.org/r/#/c/298523/1/wmf-config/InitialiseSettings.php but I do not have the data to back that up [19:26:30] (03PS3) 10Eevans: [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 [19:30:08] (03CR) 10jenkins-bot: [V: 04-1] [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 (owner: 10Eevans) [19:36:29] I also see some "Deadlock found when trying to get lock; try restarting transaction" related to cx, but I already reported that [19:36:57] jynus, Nikerabbit - sorry, disconnected [19:37:26] (03CR) 10BryanDavis: mediawiki::php: /etc/php5/apache2 provided by php5-dbg (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301404 (owner: 10BryanDavis) [19:37:27] jynus, Nikerabbit - the deadlock this is going to be resolved in the deployment tomorrow, isn't it? (Or is it something else?) [19:37:57] aharoni: I am not aware of any particular commit fixing it [19:38:00] I have no idea [19:38:07] those are "normal" [19:38:18] I reported when the level was too high [19:38:25] and you did some changes [19:38:26] !log deploy restbase 8f5e2897e to staging [19:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:38:38] and now the leves is low, nothing to worry about [19:38:42] Nikerabbit: you mentioned some other recurring error that you said is going to be fixed on Thursday, but never mind that. [19:38:54] I possibly missed some discussion that you had here, [19:39:07] but I can provide timestamps for those readonly errors if it helps [19:39:13] I don't have much more than timestamps. [19:39:46] EventLogging logs source and target language, and the wiki where it was running (which should be the same as the target language) [19:39:47] aharoni: hmm that was the php warning in relation to abuse filter [19:39:57] Nikerabbit: right, probably that [19:40:31] and it logs the trace, which is the same everywhere, and which I already posted in the task description [19:40:36] 06Operations, 10Traffic, 07Browser-Support-Firefox, 07HTTPS: Secure connection failed when attempting to send POST request using HTTP/2 (if connection has been idle for a certain time) - https://phabricator.wikimedia.org/T134869#2499986 (10BBlack) 05Open>03Resolved a:03BBlack I think either this has... [19:40:41] and the source and target article titles [19:40:52] and the username. that's it. [19:43:17] (03PS2) 10Dzahn: firejail: imagemagick rules must depend on package [puppet] - 10https://gerrit.wikimedia.org/r/301403 (owner: 10BryanDavis) [19:43:57] Nikerabbit: I tried searching logstash by timestamp, but couldn't find anything relevant. Either it's not there (should it be?) or I looked at the wrong place. [19:44:12] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3514/ checked. where there are changes it's the package dependency and it's installed." [puppet] - 10https://gerrit.wikimedia.org/r/301403 (owner: 10BryanDavis) [19:44:45] 06Operations, 10Traffic, 07HTTPS, 07Tracking: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#2500003 (10BBlack) [19:45:36] (03PS4) 10Eevans: [WIP]: Configurable `vm.diry_background_bytes` parameter [puppet] - 10https://gerrit.wikimedia.org/r/301425 [19:45:42] PROBLEM - puppet last run on restbase2007 is CRITICAL: CRITICAL: puppet fail [19:49:50] (03CR) 10BryanDavis: scap: Allow configuration of the master rsync server in wmflabs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [19:49:54] RECOVERY - puppet last run on restbase2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:50:09] 06Operations, 06Performance-Team, 06Services, 07Availability, and 2 others: Create BagOStuff subclass for HTTP - https://phabricator.wikimedia.org/T137272#2500020 (10Smalyshev) BTW, right now we have deleteObjectsExpiringBefore implemented as stub. Should we implement some operation for it in REST API and... [19:50:27] (03CR) 10Dzahn: "yup, no-op" [puppet] - 10https://gerrit.wikimedia.org/r/301403 (owner: 10BryanDavis) [19:51:03] (03PS3) 10Dzahn: Gerrit: Remove all the junk to support 2.8 [puppet] - 10https://gerrit.wikimedia.org/r/300930 (owner: 10Chad) [19:51:52] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [19:54:00] bblack: gerrit has "mid" ciphersuite and no complaints either so far ( and i'm sure that meant more users than lists.wm web ui) [19:54:22] or actually, just dont know the numbers [19:56:21] basically all our one-off sites in our control do now if they're nginx or apache+jessie. but the ones on apache+(trusty|precise) are stuck on 'compat' until the OS upgrades (at which point they'll upgrade automatically) [19:57:44] (03CR) 10Thcipriani: [C: 032] group1 wikis to 1.28.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301423 (owner: 10Thcipriani) [19:58:00] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 18 probes of 237 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [19:58:08] (03Merged) 10jenkins-bot: group1 wikis to 1.28.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301423 (owner: 10Thcipriani) [19:59:08] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.12 [19:59:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:59:39] bblack: yep, i saw that comment. (they will upgrade automatically with the OS upgrade) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T2000). Please do the needful. [20:00:34] you have a new service deployer since last night [20:07:04] (03CR) 10Chad: "Ah ok, didn't realize find exited with 0 when it found stuff. Easy enough to do will amend" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [20:10:48] (03PS1) 10BBlack: cache_misc: ECDSA and OCSP Stapling for planet [puppet] - 10https://gerrit.wikimedia.org/r/301436 [20:11:35] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2500088 (10Nemo_bis) Thanks mark for clarifying the subject. [20:13:40] (03CR) 10BBlack: [C: 032 V: 032] cache_misc: ECDSA and OCSP Stapling for planet [puppet] - 10https://gerrit.wikimedia.org/r/301436 (owner: 10BBlack) [20:18:03] there might be some intermittent cache_misc puppetfails, it's a dependency issue that resolves on second run (trying to just force them twice and avoid it) [20:18:11] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [20:19:20] (03CR) 10Eevans: "I'd appreciate any feedback with respect to the approach here." [puppet] - 10https://gerrit.wikimedia.org/r/301425 (owner: 10Eevans) [20:19:22] (03CR) 10Eevans: "I'd appreciate any feedback with respect to the approach here." [puppet] - 10https://gerrit.wikimedia.org/r/301425 (owner: 10Eevans) [20:35:04] 06Operations, 06Performance-Team, 06Services, 07Availability, and 2 others: Create BagOStuff subclass for HTTP - https://phabricator.wikimedia.org/T137272#2500119 (10GWicke) @smalyshev, if we use the Cassandra backend we'll get fixed TTL expiry for free. Lets hold off on the interface until we know whether... [20:36:42] (03PS1) 10BBlack: planet.wm.o: use CSP:upgrade-insecure-requests to avoid mixed content [puppet] - 10https://gerrit.wikimedia.org/r/301467 [20:43:41] (03CR) 10Dzahn: [C: 031] "the description on https://www.w3.org/TR/upgrade-insecure-requests/ totally fits the case here" [puppet] - 10https://gerrit.wikimedia.org/r/301467 (owner: 10BBlack) [20:45:36] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#2500130 (10BBlack) Anyone have one to test on? [20:46:07] (03PS1) 10Chad: Git::clone: rename $default_source to $source and add github [puppet] - 10https://gerrit.wikimedia.org/r/301484 [20:46:39] (03CR) 10Chad: "Param is not used yet by any callers so shouldn't impact anyone." [puppet] - 10https://gerrit.wikimedia.org/r/301484 (owner: 10Chad) [20:47:45] !log restarting rabbitmq-server on labcontrol1001 because sometimes that fixes a thing [20:47:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:49:44] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#1444194 (10Paladox) I have an xbox 360, I will be near one on the weekend then I will have to search for where I put it. :). [20:50:42] 06Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 07HTTPS, 07JavaScript: Use Upgrade Insecure Requests on Wikimedia - https://phabricator.wikimedia.org/T101002#1326308 (10BBlack) >>! In T101002#1326438, @Krinkle wrote: > This header currently results in Chrome changing all HTTP requests opened fro... [20:52:28] 06Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 07HTTPS, 07JavaScript: Use Upgrade Insecure Requests on Wikimedia - https://phabricator.wikimedia.org/T101002#2500147 (10BBlack) Also, we may want to merge tasks one direction or the other with T36670. Personally, I'm a fan of setting this for all... [20:57:07] (03PS1) 10Andrew Bogott: TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/301488 (https://phabricator.wikimedia.org/T138509) [20:57:36] (03PS2) 10Andrew Bogott: TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/301488 (https://phabricator.wikimedia.org/T138509) [21:00:45] 06Operations, 10Traffic, 10Wikimedia-Planet: mixed-content issues on planet.wikimedia.org - https://phabricator.wikimedia.org/T141480#2500163 (10Dzahn) [21:02:28] 06Operations, 10Traffic, 10Wikimedia-Planet: mixed-content issues on planet.wikimedia.org - https://phabricator.wikimedia.org/T141480#2500184 (10Dzahn) [21:03:05] (03PS2) 10BBlack: planet.wm.o: use CSP:upgrade-insecure-requests to avoid mixed content [puppet] - 10https://gerrit.wikimedia.org/r/301467 (https://phabricator.wikimedia.org/T141480) [21:04:08] i edited the commit message in web ui and bot shows it like that ^ [21:04:23] just added a ticket to it [21:07:36] (03CR) 10Andrew Bogott: [C: 032] TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/301488 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [21:07:59] mutante: https://phabricator.wikimedia.org/T141329 :) [21:08:08] due to recent gerrit upgrade [21:08:41] !log starting mobileapps deploy [21:08:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:52] Krinkle: oh yea, i noticed that ticket [21:10:23] 2.12.3 is gonna be soon-ish [21:11:21] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [21:13:03] !log deployed mobileapps e561edf [21:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:16:01] PROBLEM - Apache HTTP on mw1267 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:17:31] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:18:00] RECOVERY - Apache HTTP on mw1267 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 1.271 second response time [21:22:15] (03CR) 10Jhobs: [C: 04-1] Change default gallery mode to 'packed' on the English Wikipedia (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester) [21:28:57] (03PS3) 10BBlack: planet.wm.o: use CSP:upgrade-insecure-requests to avoid mixed content [puppet] - 10https://gerrit.wikimedia.org/r/301467 (https://phabricator.wikimedia.org/T141480) [21:29:11] (03CR) 10BBlack: [C: 032 V: 032] planet.wm.o: use CSP:upgrade-insecure-requests to avoid mixed content [puppet] - 10https://gerrit.wikimedia.org/r/301467 (https://phabricator.wikimedia.org/T141480) (owner: 10BBlack) [21:32:42] 06Operations, 10Traffic, 07HTTPS, 07Tracking: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#2500330 (10BBlack) [21:34:07] !log planet2001 tmp disable puppet for testing [21:34:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:34:24] that's the inactive instance [21:46:11] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [21:50:11] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:56:52] (03PS1) 10BBlack: ciphersuites: use dhe+3des in compat list [puppet] - 10https://gerrit.wikimedia.org/r/301501 [21:58:30] (03CR) 10BBlack: [C: 032 V: 032] ciphersuites: use dhe+3des in compat list [puppet] - 10https://gerrit.wikimedia.org/r/301501 (owner: 10BBlack) [22:00:39] (03PS1) 10Eevans: relaxed directory permissions [puppet] - 10https://gerrit.wikimedia.org/r/301502 [22:01:12] (03PS2) 10Eevans: relaxed directory permissions [puppet] - 10https://gerrit.wikimedia.org/r/301502 [22:02:52] (03PS1) 10Yuvipanda: tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) [22:03:18] bd808 ^ patch [22:03:30] (03CR) 10jenkins-bot: [V: 04-1] tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) (owner: 10Yuvipanda) [22:03:46] (03PS2) 10Yuvipanda: tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) [22:03:55] brb [22:05:03] (03CR) 10jenkins-bot: [V: 04-1] tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) (owner: 10Yuvipanda) [22:08:45] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#2500368 (10Paladox) Testing using the developer tools on internet explorer I set the user agent string to xbox 360 internet explorer and Wik... [22:11:31] (03CR) 10BryanDavis: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) (owner: 10Yuvipanda) [22:12:03] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#1444194 (10Platonides) @Paladox did you test it with an actual xbox 360? Or did you just spoof the UA on a Windows system? [22:12:50] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#2500385 (10Paladox) >>! In T105455#2500369, @Platonides wrote: > @Paladox did you test it with an actual xbox 360? Or did you just spoof the... [22:14:11] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [22:15:34] 06Operations, 10Traffic, 07Browser-Support-Internet-Explorer, 07HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#2500408 (10BBlack) This kind of testing needs the real thing unfortunately. The UA string doesn't have much impact here. [22:16:11] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5057896 keys - replication_delay is 0 [22:24:54] (03PS1) 10BryanDavis: [WIP] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:27:15] (03PS2) 10BryanDavis: [WIP] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:28:08] (03PS1) 10Addshore: Add dump_log_dir to stats::wmde config [puppet] - 10https://gerrit.wikimedia.org/r/301511 (https://phabricator.wikimedia.org/T119070) [22:28:42] PROBLEM - puppet last run on cp2018 is CRITICAL: CRITICAL: puppet fail [22:28:59] 06Operations, 06Release-Engineering-Team, 15User-greg: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2500477 (10greg) [22:38:17] (03PS3) 10Yuvipanda: tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) [22:38:30] 06Operations, 13Patch-For-Review, 05Wikimedia-Incident: Rotate (nutcracker) logs more frequently on terbium to save disk space - https://phabricator.wikimedia.org/T139786#2500507 (10greg) [22:38:45] (03CR) 10Addshore: "Scheduled for Puppet SWAT on the 28th" [puppet] - 10https://gerrit.wikimedia.org/r/301511 (https://phabricator.wikimedia.org/T119070) (owner: 10Addshore) [22:40:47] (03CR) 10Yuvipanda: [C: 032] tools: Add simple loggingreceiver role [puppet] - 10https://gerrit.wikimedia.org/r/301503 (https://phabricator.wikimedia.org/T141270) (owner: 10Yuvipanda) [22:43:00] RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:44:41] (03PS1) 10Yuvipanda: tools: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/301514 [22:44:57] 06Operations, 10Cassandra, 10hardware-requests, 05Wikimedia-Incident: Staging / Test environment(s) for RESTBase - https://phabricator.wikimedia.org/T136340#2500541 (10greg) [22:45:17] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/301514 (owner: 10Yuvipanda) [22:48:08] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:09] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:11] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:12] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:13] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:15] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:16] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:17] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:19] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:20] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:21] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:23] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:24] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:26] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:27] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:28] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:48:30] <[AlvaroMolina]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:50:36] !log restbase deploy cdd164c4e8 to staging [22:50:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:50:48] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:50:49] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:51:50] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:51:52] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:51:53] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:51:55] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:51:56] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:52:23] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:52:24] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:52:26] <[UAwiki]> DEJA DE ECHARME JEM DE MIERDA O TE MATO..... Y HARE Q TE VAYAS DE WKIMEDIA........ TE QUED CLARO HIJUEPUT4 [22:53:25] 06Operations, 06Discovery-Search-Backlog, 05Wikimedia-Incident: Enable GC (garbage collection) logs on Elasticsearch JVM - https://phabricator.wikimedia.org/T134853#2500571 (10greg) [22:54:53] 06Operations, 10Mobile-Content-Service, 06Parsing-Team, 06Services, and 2 others: Create functional cluster checks for all services (and have them page!) - https://phabricator.wikimedia.org/T134551#2500578 (10greg) [22:55:39] (03PS3) 10BryanDavis: [WIP] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:55:46] 06Operations, 10ops-codfw, 06Discovery, 10Wikidata, and 2 others: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline. - https://phabricator.wikimedia.org/T124627#2500582 (10greg) [22:55:58] 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service, 05Wikimedia-Incident: Deploy WDQS nodes on codfw - https://phabricator.wikimedia.org/T124862#2500583 (10greg) [22:58:23] Oh it is changing usernames and keeping same ip [22:58:24] 159.203.58.213 [22:58:25] lol [22:58:39] plus it dosent speak english. [22:59:08] it has been annoying us at the spanish channels [22:59:21] oh [22:59:23] then as we go banning him, he has been moving to other wikimedia channels, too [22:59:26] it speaks spanish [22:59:36] broken spanish actually :P [22:59:39] oh [22:59:49] It seems to know these channels are wikimedia [22:59:55] I don't have +o here to have kicked him, though [22:59:58] so it is targeting wikimedia on purpose [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160727T2300). Please do the needful. [23:00:04] kaldari and Amir1: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:05] yep, only ops do [23:00:15] available [23:00:16] yes, jem is a sysop [23:00:23] oh [23:00:32] and very active irc user [23:00:36] oh [23:01:04] stop kicking me jem or I'll kill you… with a number of insults [23:01:39] oh [23:01:41] Hi. I can SWAT this evening. [23:02:38] o/ [23:04:03] \m/ [23:07:10] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [23:08:20] Dereckson: actually, can you cancel my SWAT deploy. I'm going to move it to later [23:08:33] Dereckson: there is no way to test that patch in canary nodes, I guess we can just send it to all nodes [23:08:44] it simply adds some data to logs [23:09:30] kaldari: side question, where did you test uca-default-u-kn up to now? [23:10:24] only locally. I guess I need to do beta cluster first huh? [23:10:47] yes :) [23:11:02] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [23:11:17] you guys don't like breaking stuff any more? ;) [23:11:20] as a benefit run a maintenance script on a test instance will be faster than on test. [23:11:43] (on a beta instance) [23:12:29] (03CR) 10Krinkle: scap: make deployment aware of canary machines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) (owner: 10Thcipriani) [23:13:04] (03CR) 10Dereckson: [C: 04-2] "Not yet tested on beta cluster, only locally." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301394 (https://phabricator.wikimedia.org/T141433) (owner: 10Kaldari) [23:13:12] (03Abandoned) 10Kaldari: Test numeric collation on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301394 (https://phabricator.wikimedia.org/T141433) (owner: 10Kaldari) [23:13:42] (03PS1) 10Yuvipanda: tools: Depend on /srv [puppet] - 10https://gerrit.wikimedia.org/r/301517 [23:14:15] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Depend on /srv [puppet] - 10https://gerrit.wikimedia.org/r/301517 (owner: 10Yuvipanda) [23:18:17] Amir1: live on mw1099 (acked for non testable there) [23:19:07] thanks [23:21:31] !log dereckson@tin Synchronized php-1.28.0-wmf.12/extensions/ORES/includes/Cache.php: Add revision_id to log for errors (T141368, 1/2) (duration: 00m 31s) [23:21:32] T141368: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368 [23:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:21:50] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [23:22:38] !log dereckson@tin Synchronized php-1.28.0-wmf.12/extensions/ORES/maintenance/PopulateDatabase.php: Add revision_id to log for errors (T141368, 2/2, no-op) (duration: 00m 29s) [23:22:39] T141368: [Investigate] ORES time out errors in logs - https://phabricator.wikimedia.org/T141368 [23:22:44] Amir1: here you are ^ [23:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:22:52] awesome [23:23:05] thanks Dereckson, then I need wait and check logs [23:24:06] (03CR) 10Thcipriani: scap: make deployment aware of canary machines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) (owner: 10Thcipriani) [23:28:02] Amir1: would ORES use ApiQueryUserContributions? [23:28:24] Dereckson: nope, We don't have any API hooks [23:30:49] (03PS1) 10Kaldari: Test numeric sorting on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301520 [23:34:16] Dereckson: and for testing on Beta Cluster, I don't need you to do anything right? Just merge it and test? [23:36:11] That would create an undeployed change. We try to keep code deployed and repo in sync. [23:36:55] Would you prefer that I deploy and sync it, or you? [23:37:06] (03PS4) 10BryanDavis: [WIP] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [23:37:14] by the way, http://en.wikipedia.beta.wmflabs.org/wiki/Special:Statistics [23:37:28] 5812 pages is probably suitable for a test [23:37:59] true [23:38:36] I can deploy it if you wish. Let's go. [23:39:29] Cool. Also I have no idea how to run a maintanence script on the Beta Cluster (and couldn't find any documentation on how to). What server do I ssh to? [23:39:48] Doesn't beta cluster run update.php after every commit? [23:40:05] does update.php run undateCollation.php as well? [23:40:13] yes [23:40:14] er updateCollation [23:40:22] oh, that's handy [23:40:24] Assuming that the collation value has changed [23:40:29] yes [23:40:40] so it basically does updateCollation.php without the --force option [23:40:41] no --force necessary [23:40:51] which should be fine in this case [23:41:13] Dereckson: OK, here's the new patch: https://gerrit.wikimedia.org/r/#/c/301520/ [23:41:45] (03CR) 10Krinkle: scap: make deployment aware of canary machines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) (owner: 10Thcipriani) [23:42:33] (03PS2) 10Kaldari: Test numeric sorting on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301520 (https://phabricator.wikimedia.org/T141433) [23:43:11] (03CR) 10Dereckson: [C: 032] "SWAT (no-op in prod)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301520 (https://phabricator.wikimedia.org/T141433) (owner: 10Kaldari) [23:43:44] (03Merged) 10jenkins-bot: Test numeric sorting on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301520 (https://phabricator.wikimedia.org/T141433) (owner: 10Kaldari) [23:45:24] kaldari: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/5376/console [23:45:27] live on labs [23:45:32] !log dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Test numeric sorting on Beta Cluster ([[Gerrit:301520]], labs only, no-op in prod) (duration: 00m 23s) [23:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:45:42] thanks... [23:47:26] bawolff: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/113062/console doesn't notify about update.php, it's not this task trigerring it? [23:49:11] Dereckson, bawolff: Looks like the patch is working but the old sort keys haven't been regenerated: http://en.wikipedia.beta.wmflabs.org/wiki/Category:Sort_test [23:49:26] that's a mix of pages created before and after the deploy [23:49:47] * bawolff doesn't really know how beta cluster works [23:49:59] Maybe something else triggers update.php to run [23:50:10] Dereckson: Do you know how I can manually run the update script on Beta Labs? [23:50:42] or any maintenance script [23:50:49] (update.php doesn't call updateCollation.php, btw) [23:51:19] (03PS1) 10Krinkle: contint: Remove 'integration/phpcs' deployment source [puppet] - 10https://gerrit.wikimedia.org/r/301523 [23:52:21] https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated [23:53:17] Documentation only summarizes how beta-update-databases-eqiad pushes changes. [23:53:24] MatmaRex: DatabaseUpdater::doCollationUpdate should call it i think [23:53:29] ssh deployment-tin.deployment-prep [23:53:34] which should get called during update.php [23:53:35] (er... beta-mediawiki-config-update-eqiad) [23:53:44] then run it like you would in prod [23:53:57] (except obviously you wouldn't ever run update.php in prod - ever) [23:54:02] https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/lastBuild/ says last build half an hour ago [23:55:05] bawolff: :o i could swear that wasn't there last time i looked [23:55:14] Seems to happen once an hour on the 20 [23:55:20] So should run in 25 minutes [23:55:43] MatmaRex: Its kind of hidden since there's a lot of indirection in how update.php works [23:56:52] WMF645:mediawiki-config kaldari$ ssh deployment-tin.deployment-prep.eqiad.wmnet [23:56:52] channel 0: open failed: administratively prohibited: open failed [23:56:52] ssh_exchange_identification: Connection closed by remote host [23:57:13] eqiad.wmflabs [23:57:13] not wmnet [23:57:17] ah [23:57:46] (03CR) 10Thcipriani: scap: make deployment aware of canary machines (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/294742 (https://phabricator.wikimedia.org/T110068) (owner: 10Thcipriani) [23:59:46] hmm, something's not working right. It's asking me for a password when I ssh there.