[00:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T0000). [00:00:20] !log demon@tin Synchronized multiversion/MWVersion.php: Swap to using MWMultiVersion and make this a fallback (duration: 00m 39s) [00:00:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:03:41] (03CR) 10Mobrovac: [C: 031] fix incorrect port in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/332682 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [00:04:26] (03PS1) 10Chad: Swap mobilelanding to use new Multiversion entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332690 [00:06:24] (03CR) 10Mobrovac: [C: 031] Prometheus JMX exporter deploy repository [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/332542 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [00:07:03] (03PS2) 10Chad: Swap ori's `mw` script to using proper entry point [puppet] - 10https://gerrit.wikimedia.org/r/332648 [00:10:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1393 [00:10:04] PROBLEM - check_mysql on frdb1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1244 [00:10:04] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2506 [00:11:57] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2947783 (10madhuvishy) All the data will be migrated over and shouldn't need any prior action. If any of the services that are writing to /home or /data/project do... [00:14:37] (03PS1) 10Chad: Typofix: MWVersion -> MWMultiVersion [puppet] - 10https://gerrit.wikimedia.org/r/332695 [00:14:55] apergos: That one is a minor typofix from the thing we already did for your runallthescripts thing ^ [00:15:01] (03CR) 10ArielGlenn: "Thanks for documenting the class ;-)" [puppet] - 10https://gerrit.wikimedia.org/r/332543 (https://phabricator.wikimedia.org/T154940) (owner: 10Dzahn) [00:15:04] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2806 [00:15:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2807 [00:15:35] ^^^ looking at fundraising replag [00:15:40] ostriches: ah good catch [00:16:12] I will get that tomorrow. at 2:15 am I am pretending not to work any more [00:16:30] well except that I did just make a lame comment on a changeset [00:16:32] anywways... [00:16:55] No worries, I'm not removing that file today or anything :) [00:16:59] (plus it's just a comment) [00:17:40] yep [00:20:04] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1846 [00:20:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3106 [00:20:04] RECOVERY - check_mysql on frdb1001 is OK: Uptime: 5562042 Threads: 53 Questions: 690994282 Slow queries: 46234 Opens: 27913 Flush tables: 1 Open tables: 597 Queries per second avg: 124.233 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [00:25:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3406 [00:25:04] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2146 [00:26:59] (03CR) 10Mobrovac: [C: 04-1] "LGTM, one minor nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [00:27:21] Reedy: We already redirect wiki.phtml at the apache level, w/wiki.phtml is pretty much useless right? [00:27:28] (in the docroot, I mean) [00:29:18] (03CR) 10Chad: [C: 032] Swap mobilelanding to use new Multiversion entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332690 (owner: 10Chad) [00:30:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2446 [00:30:04] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 2281167 Threads: 1 Questions: 58549888 Slow queries: 12358 Opens: 7462 Flush tables: 2 Open tables: 542 Queries per second avg: 25.666 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [00:32:21] (03Merged) 10jenkins-bot: Swap mobilelanding to use new Multiversion entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332690 (owner: 10Chad) [00:35:00] (03CR) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [00:35:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2746 [00:35:17] (03PS7) 10Eevans: WIP: Enable Prometheus JMX exporter on Cassandra nodes [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) [00:35:32] !log demon@tin Synchronized w/mobilelanding.php: Last major fix for multiversion (duration: 00m 45s) [00:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:40:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 3046 [00:40:09] (03PS1) 10Chad: Remove wiki.phtml [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332698 [00:41:56] (03CR) 10Volans: "Puppet compiler of the last revision: https://puppet-compiler.wmflabs.org/5136/" [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [00:45:01] (03CR) 10Chad: "Not *quite* working yet, get warnings about running it from a non-CLI application." [puppet] - 10https://gerrit.wikimedia.org/r/332673 (owner: 10Chad) [00:45:04] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2617 [00:50:04] RECOVERY - check_mysql on lutetium is OK: Uptime: 4355284 Threads: 3 Questions: 590357284 Slow queries: 21363 Opens: 94699239 Flush tables: 2 Open tables: 64 Queries per second avg: 135.549 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [00:53:58] !log mobrovac@tin Starting deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571 [00:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:54:03] T145571: Recovery: Trending service should be able to replay last 1hr of edits - https://phabricator.wikimedia.org/T145571 [00:54:04] T153122: Investigate delay growth in trending service - https://phabricator.wikimedia.org/T153122 [00:55:16] (03PS1) 10Chad: MWMultiVersion -> self [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332704 [00:55:40] (03CR) 10Mobrovac: [C: 031] "Looks good. I like the simplicity of the boolean flag." [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [00:59:04] !log mobrovac@tin Finished deploy [trending-edits/deploy@1d53b7c]: fixes for T153122 and T145571 (duration: 05m 06s) [00:59:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:59:09] T145571: Recovery: Trending service should be able to replay last 1hr of edits - https://phabricator.wikimedia.org/T145571 [00:59:10] T153122: Investigate delay growth in trending service - https://phabricator.wikimedia.org/T153122 [01:04:54] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [01:12:21] I can't log into sca1004 ^ [01:17:19] !log mobrovac@tin Starting deploy [citoid/deploy@9f93a00]: (no message) [01:17:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:17:49] 06Operations, 06Discovery, 10Traffic, 10Wikidata, and 2 others: Consider switching to HTTPS for Wikidata query service links - https://phabricator.wikimedia.org/T153563#2884289 (10Ricordisamoa) Is it advisable to use statements like `strafter(str(?item), str(wd:))` to avoid hard-coding URI prefixes within... [01:18:08] (03PS3) 10Madhuvishy: nfs-mounts: Remove wikidata-quality from nfs-mount yaml [puppet] - 10https://gerrit.wikimedia.org/r/330173 [01:19:22] (03CR) 10Madhuvishy: [C: 032] nfs-mounts: Remove wikidata-quality from nfs-mount yaml [puppet] - 10https://gerrit.wikimedia.org/r/330173 (owner: 10Madhuvishy) [01:21:33] !log mobrovac@tin Finished deploy [citoid/deploy@9f93a00]: (no message) (duration: 04m 14s) [01:21:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:24:54] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [01:26:56] (03PS1) 10Chad: Swap from protocol-relative urls to https everywhere [puppet] - 10https://gerrit.wikimedia.org/r/332707 [01:28:57] (03PS5) 10Madhuvishy: nfs: Dual mount misc projects from labstore-secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/329711 (https://phabricator.wikimedia.org/T154336) [01:29:11] (03CR) 10Madhuvishy: [V: 032 C: 032] nfs: Dual mount misc projects from labstore-secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/329711 (https://phabricator.wikimedia.org/T154336) (owner: 10Madhuvishy) [01:29:53] (03PS1) 10RobH: update icinga cert check to letsencrypt for librenms [puppet] - 10https://gerrit.wikimedia.org/r/332709 [01:32:40] 06Operations, 10Traffic: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2948104 (10RobH) [01:32:44] 06Operations, 10Traffic, 13Patch-For-Review: convert librenms.wikimedia.org from GS to LE cert (expires: 2017-02-11) - https://phabricator.wikimedia.org/T154919#2948103 (10RobH) [01:32:54] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [01:33:02] (03PS1) 10Volans: Add missing comment to sca2* Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710 [01:52:54] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [02:10:44] PROBLEM - puppet last run on elastic1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:15:24] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [02:16:24] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2969434 keys, up 78 days 17 hours - replication_delay is 0 [02:29:29] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.7) (duration: 08m 23s) [02:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:44] RECOVERY - puppet last run on elastic1032 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [02:50:27] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2948310 (10madhuvishy) [02:51:14] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2907785 (10madhuvishy) Removed wikidata-dev from list of affected projects - It has nfs-mount turned off explicitly via hiera - https://wikitech.wikimedia.org/wiki... [03:00:09] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.8) (duration: 13m 22s) [03:00:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:00:27] (03PS2) 10Jforrester: [WIP] Add composer test for coding standards and try to pass [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271936 [03:00:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add composer test for coding standards and try to pass [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271936 (owner: 10Jforrester) [03:00:40] (03CR) 10Jforrester: "PS2 is just the human-made changes to PS1, modernised." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271936 (owner: 10Jforrester) [03:05:47] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 18 03:05:47 UTC 2017 (duration 5m 38s) [03:05:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:12:30] (03PS1) 10TTO: Set wgDisableUserGroupExpiry to true on production, false on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332721 (https://phabricator.wikimedia.org/T155605) [03:16:34] (03PS2) 10Tim Landscheidt: postgresql: Only set user password if different [puppet] - 10https://gerrit.wikimedia.org/r/329328 [03:23:54] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 662.67 seconds [03:30:54] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 269.68 seconds [04:40:04] PROBLEM - puppet last run on labstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:08:04] RECOVERY - puppet last run on labstore1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:06:32] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 12 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2948472 (10GWicke) Over the last ~10 hours we have not seen any issues with node 6 & RESTBase. As expected, the most noticeable impact is significantly reduced memory usage:... [06:31:54] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:43:54] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:44:04] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:54:04] PROBLEM - puppet last run on elastic2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [07:01:14] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:04:42] (03PS1) 10Madhuvishy: nfsclient: Setup symlinks for /data/project and /home on labs projects from secondary nfs cluster [puppet] - 10https://gerrit.wikimedia.org/r/332735 [07:04:54] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [07:10:16] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2948528 (10Marostegui) After the reboot the Cache looks good now ``` Cache Status: OK ``` Going to repool the server for now as it looks stable for the past few weeks. [07:10:43] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332495 (owner: 10Marostegui) [07:15:17] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332495 (owner: 10Marostegui) [07:19:20] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2948550 (10Joe) [07:22:04] RECOVERY - puppet last run on elastic2005 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [07:25:18] !log Restart MySQL dbstore2001 to apply InnoDB defaults [07:25:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:14] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [07:30:18] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336#2948556 (10madhuvishy) [07:30:35] !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw12[7-9].* [07:30:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:56] <_joe_> !log depooling mw1226-mw1235 from the https pool in eqiad, T152074 [07:33:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:00] T152074: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074 [07:34:13] !log oblivian@puppetmaster1001 conftool action : set/pooled=no; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw122[6-9].* [07:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:43] (03CR) 10Marostegui: "For the record: I have restarted dbstore2001 mysql to manually apply the variables to make innodb the default storage engine as well as in" [puppet] - 10https://gerrit.wikimedia.org/r/332228 (https://phabricator.wikimedia.org/T130128) (owner: 10Marostegui) [07:40:19] 06Operations, 10netops: cr2-esams<->cr2-eqiad link flaps - https://phabricator.wikimedia.org/T154577#2948560 (10faidon) 05Open>03Resolved a:03faidon I was just on a lengthy phone call with Level3. This seems to have been a combination of issues with a 100G card ("fixed" by a card reset) that was done ori... [07:41:54] 06Operations, 10netops: Packet loss from Voxel to text load balancers - https://phabricator.wikimedia.org/T153998#2948566 (10faidon) 05stalled>03declined Since this was a user on IRC I doubt we'll hear much soon. Declining for now, feel free to reopen if the issue persists and we hear back from this or ano... [07:48:12] (03PS1) 10Urbanecm: [throttle] Add one rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332738 (https://phabricator.wikimedia.org/T154312) [07:48:54] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [07:49:06] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [07:56:30] (03PS2) 10Muehlenhoff: Grant access to analytics-privatedata-users to demon [puppet] - 10https://gerrit.wikimedia.org/r/331925 (https://phabricator.wikimedia.org/T155198) (owner: 10Chad) [07:56:57] <_joe_> !log restarting pybal on lvs1003 [07:57:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:57] (03CR) 10Muehlenhoff: [C: 032] Grant access to analytics-privatedata-users to demon [puppet] - 10https://gerrit.wikimedia.org/r/331925 (https://phabricator.wikimedia.org/T155198) (owner: 10Chad) [08:00:45] <_joe_> !log restarting pybal on lvs1003 [08:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:43] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to hive/webrequest data for demon - https://phabricator.wikimedia.org/T155198#2948603 (10MoritzMuehlenhoff) 05Open>03Resolved @demon You should now be able to log into stat1004.eqiad.wmnet. Ping me on IRC if you run into any probl... [08:10:34] PROBLEM - Host puppetmaster1002 is DOWN: PING CRITICAL - Packet loss = 100% [08:11:24] RECOVERY - Host puppetmaster1002 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [08:13:43] (03PS6) 10Juniorsys: geowiki module: Lint changes + modes/umask quoting [puppet] - 10https://gerrit.wikimedia.org/r/332101 (https://phabricator.wikimedia.org/T93645) [08:13:52] (03PS5) 10Juniorsys: mediawiki module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332103 (https://phabricator.wikimedia.org/T93645) [08:14:03] (03PS5) 10Juniorsys: postgresql module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332104 (https://phabricator.wikimedia.org/T93645) [08:14:21] (03PS5) 10Juniorsys: puppetmaster module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332105 (https://phabricator.wikimedia.org/T93645) [08:16:01] !log Compressing templatelinks tables on db1038 - T154465 [08:16:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:06] T154465: Defragment db1038 - https://phabricator.wikimedia.org/T154465 [08:16:27] (03PS5) 10Juniorsys: role analytics_cluster: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) [08:16:51] (03PS5) 10Juniorsys: toollabs role modules: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332110 (https://phabricator.wikimedia.org/T93645) [08:17:07] (03PS5) 10Juniorsys: toollabs module: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/332111 [08:18:45] !log Compressing templatelinks tables on db1035 - T154465 [08:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:45] (03PS5) 10Juniorsys: ganglia module: Use full names for class names [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) [08:25:56] (03CR) 10Juniorsys: "Should be fixed" [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [08:26:54] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:27:04] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:33:16] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2060 - T154031 (duration: 00m 40s) [08:33:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:21] T154031: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031 [08:34:36] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2948654 (10Marostegui) 05Open>03Resolved a:05jcrespo>03Marostegui [08:38:25] (03PS3) 10Marostegui: mariadb: Enable gtid_domain_id - phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/326446 (https://phabricator.wikimedia.org/T149418) [08:40:14] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [08:42:08] (03PS1) 10Marostegui: db-codfw.php: Depool db2063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332744 (https://phabricator.wikimedia.org/T154097) [08:44:52] (03CR) 10Hashar: "I missed your patch sorry. I don't think DirectoryIndex would work when browsing an empty sub dir such as /cover/visualeditor/ . I did tr" [puppet] - 10https://gerrit.wikimedia.org/r/331558 (https://phabricator.wikimedia.org/T150727) (owner: 10Krinkle) [08:45:21] 06Operations, 10Parsoid: Parsoid unable to parse specific user page - https://phabricator.wikimedia.org/T155618#2948681 (10Joe) [08:45:28] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332744 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [08:45:36] o/ [08:45:43] jouncebot: next [08:45:43] In 5 hour(s) and 14 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T1400) [08:49:13] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332744 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [08:50:44] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2063 - T154097 (duration: 00m 39s) [08:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:48] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [08:51:12] !log Remove partitions from enwiktionary.templatelinks on db2063 - T154097 [08:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:08] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 12 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2758922 (10hashar) That is great! I had T121850 about RESTBase emitting `Heap memory limit exceed`. The last one is at 2017-01-17T18:10:12 (logstash for 7 days https://logs... [08:53:27] 06Operations, 10Parsoid: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2948707 (10Joe) [08:54:01] 06Operations, 10Parsoid, 15User-Joe: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2948681 (10Joe) a:03Joe [08:54:54] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [08:55:37] (03CR) 10Elukey: [C: 04-1] "Everything looks good and PCC is happy, there are only some includes that are missing the :: prefix in my opinion." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:04:13] 06Operations, 10Parsoid, 15User-Joe: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2948721 (10Joe) Isolating a single request, I see that most of the time is spent in executing `v8::internal::VisitWeakList` and th... [09:10:30] (03PS2) 10Dereckson: Add throttle rule for KCES IMR edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332738 (https://phabricator.wikimedia.org/T154312) (owner: 10Urbanecm) [09:30:10] (03PS1) 10Marostegui: m1,m3,m4.hosts: Add new host files [software] - 10https://gerrit.wikimedia.org/r/332747 [09:31:00] 06Operations, 10Parsoid, 15User-Joe: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2948768 (10Joe) Strace gives little more information, besides the fact for each of these pages parsoid does hundreds of preprocessing requests to the MW API.... [09:33:18] (03CR) 10Muehlenhoff: [C: 032] Fix debian's lintian test [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [09:40:37] !log Restart mysql dbstore2002 to enable gtid_domain_id manually before deploying it on m3 - T149418 [09:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:42] T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 [09:45:49] (03PS1) 10Muehlenhoff: Add subsequently assigned CVE ID to changelog [debs/linux44] - 10https://gerrit.wikimedia.org/r/332748 [09:47:55] (03CR) 10Muehlenhoff: [C: 032] Add subsequently assigned CVE ID to changelog [debs/linux44] - 10https://gerrit.wikimedia.org/r/332748 (owner: 10Muehlenhoff) [09:51:17] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332749 [09:53:47] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332749 (owner: 10Marostegui) [09:55:24] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332749 (owner: 10Marostegui) [09:56:57] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2063 - T154097 (duration: 00m 48s) [09:57:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:02] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [09:59:34] (03PS1) 10Marostegui: db-codfw.php: Depool db2064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332751 (https://phabricator.wikimedia.org/T154097) [10:02:34] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332751 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [10:04:03] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332751 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [10:04:28] (03PS2) 10Filippo Giunchedi: fix incorrect port in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/332682 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [10:05:13] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2064 - T154097 (duration: 00m 45s) [10:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:17] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [10:05:28] !log Remove partitions from enwiktionary.templatelinks on db2064 - T154097 [10:05:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:03] !log Restart mysql dbstore2001 to enable gtid_domain_id manually before deploying it on m3 - T149418 [10:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:08] T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 [10:11:07] !log pool ms-fe200[789] T152612 [10:11:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:12] T152612: codfw: rack/setup ms-fe200[5-8] - https://phabricator.wikimedia.org/T152612 [10:17:35] (03CR) 10Filippo Giunchedi: [C: 032] fix incorrect port in ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/332682 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [10:29:07] (03CR) 10Giuseppe Lavagetto: [C: 032] contint: add python-conftool [puppet] - 10https://gerrit.wikimedia.org/r/332477 (owner: 10Hashar) [10:29:16] (03PS2) 10Giuseppe Lavagetto: contint: add python-conftool [puppet] - 10https://gerrit.wikimedia.org/r/332477 (owner: 10Hashar) [10:29:22] (03CR) 10Filippo Giunchedi: "See comment about scap::dsh::groups, LGTM other than that" (031 comment) [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/332542 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [10:31:00] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/332535 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [10:37:43] (03PS3) 10Marostegui: mariadb: Split dbstore role classes [puppet] - 10https://gerrit.wikimedia.org/r/332228 (https://phabricator.wikimedia.org/T130128) [10:38:40] !log oblivian@puppetmaster1001 conftool action : set/pooled=no; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw123[0-5].* [10:38:43] !log installing libxml security updates [10:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:29] (03CR) 10Hashar: [C: 04-1] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [10:43:58] (03PS1) 10Filippo Giunchedi: graphite: fix upload vs uploads alert for reqstats [puppet] - 10https://gerrit.wikimedia.org/r/332756 [10:45:33] (03CR) 10jenkins-bot: Swap mobilelanding to use new Multiversion entry [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332690 (owner: 10Chad) [10:45:46] (03CR) 10jenkins-bot: db-codfw.php: Depool db2063 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332744 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [10:46:18] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] graphite: fix upload vs uploads alert for reqstats [puppet] - 10https://gerrit.wikimedia.org/r/332756 (owner: 10Filippo Giunchedi) [10:46:21] (03CR) 10jenkins-bot: db-codfw.php: Depool db2064 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332751 (https://phabricator.wikimedia.org/T154097) (owner: 10Marostegui) [10:46:24] (03PS2) 10Filippo Giunchedi: graphite: fix upload vs uploads alert for reqstats [puppet] - 10https://gerrit.wikimedia.org/r/332756 [10:46:31] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2060" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332495 (owner: 10Marostegui) [10:47:01] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] graphite: fix upload vs uploads alert for reqstats [puppet] - 10https://gerrit.wikimedia.org/r/332756 (owner: 10Filippo Giunchedi) [10:49:58] (03CR) 10jerkins-bot: [V: 04-1] wmflib: switch to puppetlabs_spec_helper/rake_tasks [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [10:53:35] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2063" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332749 (owner: 10Marostegui) [10:58:12] (03CR) 10jenkins-bot: Turn getMediaWiki() into back-compat to MWMultiVersion::getMediaWiki() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332666 (owner: 10Chad) [11:09:58] !log restarting mediawiki canary servers to pick up cairo and libpng updates [11:10:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:03] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332757 [11:15:02] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332757 (owner: 10Marostegui) [11:16:09] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332757 (owner: 10Marostegui) [11:17:27] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2064 - T154097 (duration: 00m 39s) [11:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:32] T154097: Remove partitions from enwiktionary.templatelinks in s2 - https://phabricator.wikimedia.org/T154097 [11:18:01] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2064" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332757 (owner: 10Marostegui) [11:23:30] (03CR) 10Paladox: "thanks." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/331873 (owner: 10Paladox) [11:38:12] PROBLEM - puppet last run on logstash1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:38:34] (03CR) 10Hashar: "_joe_ modules/wmflib/spec/functions/conftool_spec.rb fails invoking conftool:" [puppet] - 10https://gerrit.wikimedia.org/r/332475 (owner: 10Hashar) [11:42:03] !log installing sed bugfix updates from jessie point release [11:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:00] 06Operations, 10Traffic, 10Wikidata, 07HTTPS: wikiba.se should use HTTPS - https://phabricator.wikimedia.org/T155359#2948949 (10hashar) Per H131 whenever a task has the tag #HTTPS associated to it, #Traffic is automatically added. Then on a next edition #operations is added because #Traffic is present. [11:43:03] PROBLEM - Check systemd state on restbase-dev1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:43:12] PROBLEM - cassandra-a service on restbase-dev1001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [11:54:52] is this an expired downtime? [11:55:04] it seems to me that cassandra has been failing for a while [11:55:17] not sure what the status of the new testing cluster though [11:55:22] godog: --^ [11:55:27] nothing urgent [12:06:12] RECOVERY - puppet last run on logstash1006 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [12:06:29] !log installing libio-socket-ssl-perl bugfix updates from jessie point release [12:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:22] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:13:49] (03PS4) 10Marostegui: mariadb: Enable gtid_domain_id - phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/326446 (https://phabricator.wikimedia.org/T149418) [12:15:49] (03CR) 10Marostegui: [C: 032] mariadb: Enable gtid_domain_id - phabricator hosts [puppet] - 10https://gerrit.wikimedia.org/r/326446 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui) [12:21:15] !log Enable gtid_domain_id on m3 - T149418 [12:21:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:20] T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418 [12:35:22] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [13:04:12] RECOVERY - cassandra-a service on restbase-dev1001 is OK: OK - cassandra-a is active [13:07:12] PROBLEM - cassandra-a service on restbase-dev1001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [13:12:25] !log uploaded firejail 0.9.44.6 for jessie-wikimedia to carbon [13:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:38] (03PS3) 10Hashar: [WIP] test job jenkins with mw-core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320980 (https://phabricator.wikimedia.org/T115713) (owner: 10DCausse) [13:25:41] (03CR) 10jerkins-bot: [V: 04-1] [WIP] test job jenkins with mw-core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320980 (https://phabricator.wikimedia.org/T115713) (owner: 10DCausse) [13:35:58] elukey: yeah expired downtime, thanks! [13:36:00] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949078 (10faidon) Ping! Jan 25 is a week away from now, not a lot of time left for an announcement :) [13:37:54] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949082 (10mark) p:05Normal>03High [13:38:35] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2949083 (10yuvipanda) I didn't manage to send out the announcement due to unforseen personal issues. I'll send it out now after checking with jynus. [13:50:32] jouncebot: next [13:50:32] In 0 hour(s) and 9 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T1400) [13:56:59] Hello here. [13:57:01] jouncebot: refresh [13:57:03] I refreshed my knowledge about deployments. [13:57:51] Just me for SWAT today, it seems [13:58:16] (03PS6) 10Paladox: Gerrit: Set useUnicode=true, also change connectionCollation to utf8mb4_unicode_ci [puppet] - 10https://gerrit.wikimedia.org/r/330455 (https://phabricator.wikimedia.org/T145885) [13:58:44] (03PS10) 10Paladox: Gerrit: Convert from utf8 to utf8mb4 [puppet] - 10https://gerrit.wikimedia.org/r/328571 (https://phabricator.wikimedia.org/T153899) [13:59:01] oh I've added a change for the 17 instead of the 18 [13:59:09] (03Abandoned) 10Paladox: Gerrit: Convert from utf8 to utf8mb4 [puppet] - 10https://gerrit.wikimedia.org/r/328571 (https://phabricator.wikimedia.org/T153899) (owner: 10Paladox) [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T1400). Please do the needful. [14:00:04] tto: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:18] thanks jouncebot [14:00:41] tto: https://gerrit.wikimedia.org/r/#/c/331496/ should be backported to wmf7, wmf8 or both? [14:01:01] you made an error of change by the way [14:01:10] you wanted to include 'Set wgDisableUserGroupExpiry to true on production, false on labs' [14:01:23] Sorry, got the wrong number [14:01:33] and 'Increase $wgHTTPImportTimeout to 50 seconds' [14:01:41] 332721 is right. I'll find the correct number for the import change [14:01:56] https://gerrit.wikimedia.org/r/331946 [14:02:07] ^ that's the import change. Sorry about that [14:02:11] o/ [14:02:28] I +2ed the importUpload sleep patch [14:02:40] hashar: you are in charge of swat today? [14:02:58] if you have time for it I would not mind skipping :D [14:03:12] sure, can do [14:03:48] !log upgrading firejail on aqs cluster [14:03:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:27] zeljkof: so for config, it's https://gerrit.wikimedia.org/r/#/c/331946/ and https://gerrit.wikimedia.org/r/#/c/332721/ the two TTO changes [14:04:57] zeljkof: mine isn't testable on terbium, but I'll test it after SWAT, it only affects a maintenance script, script I need to run afterwards [14:04:57] so for the record: I can swat today! [14:05:10] read: mine isn't testable on mwdebug1002 [14:05:33] Dereckson: ok [14:05:49] tto: are you commits testable at mwdebug1002? [14:06:21] zeljkof: The import one is. The wgDisableUserGroupExpiry is a no-op (setting a feature flag for yet-to-be-merged core patch) [14:07:13] tto: ok, will ping you when the patch is at mwdebug1002 [14:07:18] Sure. Thanks [14:08:52] (03PS2) 10ArielGlenn: Typofix: MWVersion -> MWMultiVersion [puppet] - 10https://gerrit.wikimedia.org/r/332695 (owner: 10Chad) [14:09:52] (03CR) 10ArielGlenn: [C: 032] Typofix: MWVersion -> MWMultiVersion [puppet] - 10https://gerrit.wikimedia.org/r/332695 (owner: 10Chad) [14:10:17] tto: um, looks like the import patch links to the wrong commit in gerrit [14:10:34] 14:04:27 < Dereckson> zeljkof: so for config, it's https://gerrit.wikimedia.org/r/#/c/331946/ and https://gerrit.wikimedia.org/r/#/c/332721/ the two TTO [14:10:37] changes [14:10:49] "[config] 331496 Increase $wgHTTPImportTimeout to 50 seconds" > "Add grunt-jsonlint and grunt-banana-checker" [14:10:59] Dereckson: I see, thanks [14:11:02] https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1342228&oldid=1342183 [14:11:10] I fixed the deployment table by the way. [14:11:13] Sorry for the confusion, a simple typo [14:11:47] thanks, will refresh, let's try again [14:14:05] (03PS3) 10Zfilipin: Increase $wgHTTPImportTimeout to 50 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [14:16:14] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [14:17:45] (03Merged) 10jenkins-bot: Increase $wgHTTPImportTimeout to 50 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [14:18:10] (03PS2) 10Zfilipin: Set wgDisableUserGroupExpiry to true on production, false on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332721 (https://phabricator.wikimedia.org/T155605) (owner: 10TTO) [14:18:22] (03CR) 10jenkins-bot: Increase $wgHTTPImportTimeout to 50 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/331946 (https://phabricator.wikimedia.org/T155209) (owner: 10TTO) [14:19:53] tto: 331946 is at mwdebug1002, please test and let me know if I can proceed [14:20:07] zeljkof, will test. Might be 10 or 15 minutes before I can say yes or no [14:21:09] tto: hm, maybe this one should be the last then, but it is done now [14:21:57] ok, your next change does not touch the same file, I will continue with that [14:23:38] tto: 332721 is not testable at mwdebug1002, right? should be deployed straight to prod? [14:24:00] zeljkof: That's correct. [14:25:25] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332721 (https://phabricator.wikimedia.org/T155605) (owner: 10TTO) [14:26:59] (03Merged) 10jenkins-bot: Set wgDisableUserGroupExpiry to true on production, false on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332721 (https://phabricator.wikimedia.org/T155605) (owner: 10TTO) [14:28:30] (03CR) 10jenkins-bot: Set wgDisableUserGroupExpiry to true on production, false on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332721 (https://phabricator.wikimedia.org/T155605) (owner: 10TTO) [14:28:31] (03PS1) 10Muehlenhoff: Stick with node 4.6 on maps due to karthotherian not being ready for node 6 [puppet] - 10https://gerrit.wikimedia.org/r/332768 (https://phabricator.wikimedia.org/T149331) [14:30:07] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:332721|Set wgDisableUserGroupExpiry to true on production, false on labs (T155605)]] (duration: 00m 40s) [14:30:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:12] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [14:30:55] Still testing...... [14:31:05] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:332721|Set wgDisableUserGroupExpiry to true on production, false on labs (T155605)]] (duration: 00m 40s) [14:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:38] tto: 332721 is deployed, let me know when you are done testing 331946 [14:32:56] Dereckson: I see hashar has already merged 332766, it is not deployed, right? [14:33:10] * zeljkof does not see it in the log as deployed [14:33:58] and you said it is not testable at mwdebug1002, right? [14:34:02] RECOVERY - Check systemd state on restbase-dev1001 is OK: OK - running: The system is fully operational [14:34:10] zeljkof: yeah I haven't deployed it [14:34:29] hashar: ok, deploying it then [14:34:36] zeljkof: I have reviewed/CR+2 so we do not have to wait for test results :} [14:34:44] hashar: great, thanks [14:35:28] (03PS7) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [14:36:32] hashar: hm, I don't think I have ever deployed mw/core :) [14:37:02] PROBLEM - Check systemd state on restbase-dev1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:37:14] zeljkof, I'm calling it a success :) [14:37:22] Thanks for both the deploys! [14:38:00] tto: 331946 works? I can deploy it to the production? it is at mwdebug1002 only so far [14:38:44] zeljkof: Yes please [14:38:54] Dereckson: regarding the rename, if you are in mood let me know le.go seems away :( [14:39:31] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:331946|Increase $wgHTTPImportTimeout to 50 seconds (T155209)]] (duration: 00m 39s) [14:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:35] T155209: Increase $wgHTTPImportTimeout to a higher value on WMF wikis - https://phabricator.wikimedia.org/T155209 [14:39:58] tto: 331946 is deployed [14:40:09] Thanks again! [14:41:08] tto: no problem, thanks for deploying with #releng ;) [14:42:50] hashar: around to help out if I get stuck with 332766? I have not deployed mw/core so far [14:43:01] yeah [14:43:31] cd /srv/mediawiki-staging/php-1.29.0-wmf.7 [14:43:33] git fetch [14:43:38] git log HEAD..HEAD@{u} [14:43:41] hashar: I don't think Dereckson is around at the moment for 332766, should I wait until he is back? [14:43:46] review what is going to be added then git rebase [14:44:10] scap sync-file php-1.29.0-wmf.7/maintenance/importImages.php 'maintenance/importImages: Don't sleep after the last upload' [14:44:20] na change is fine [14:44:22] hashar: so pretty much the same as config deploys? [14:44:25] I will assist [14:45:39] * Dereckson is there. [14:45:51] Steinsplitter: sure, we can do that [14:45:59] Dereckson: great :) your commit should go directly to prod, right? [14:46:07] * Dereckson nods [14:46:20] I'll test it afterwards on Terbium with a pending upload task [14:46:54] but that's too long to test on mwdebug1002 (and it won't affect anything, as a maintenance script manually triggered) [14:46:57] Dereckson: it is https://commons.wikimedia.org/wiki/Special:CentralAuth/Tabbelio , if you give a OK i will start. [14:48:15] Steinsplitter: I'm ready [14:48:48] (03PS8) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [14:49:08] Steinsplitter: I'll monitor logs and centralauth.renameuser_status table [14:49:21] Dereckson: https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/WikiBenutzer [14:49:27] looks like it is processing fine :) [14:49:30] 99 pending [14:49:40] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949174 (10Paladox) @jcrespo hi, apparently you can set mutiple mysqld for example # [mysqld2] # port = 3307 # datadir... [14:49:48] 97 [14:50:34] Steinsplitter: it decreases slowly but steadily, 96 95 93 [14:50:40] 06Operations, 10Wikimedia-General-or-Unknown: Increase $wgHTTPImportTimeout to a higher value on WMF wikis - https://phabricator.wikimedia.org/T155209#2949175 (10TTO) The HTTP timeout has been increased to 50 seconds. I managed to import all 2,063 revisions of "Digital television" from enwiki to testwiki. Pl... [14:51:23] hashar: ok, I need help [14:52:10] Dereckson: sometimes (when i did it the last times) it has taken 20 minutes ca. [14:52:12] zeljkof: what is up ? [14:52:15] what I did so far: [14:52:18] zfilipin@tin:/srv/mediawiki-staging/php-1.29.0-wmf.7$ git log HEAD..origin/wmf/1.29.0-wmf.7 [14:52:20] Steinsplitter: 84 remaining [14:52:31] hashar: that shows the correct commit in log [14:52:46] and what is blocking you? [14:53:02] hashar: but "git status" says: "HEAD detached from 0ef91c6" [14:53:14] so not sure what to rebase where o.O [14:53:21] git rebase origin/wmf/1.29.0-wmf.7 [14:54:10] in /srv/mediawiki-staging/php-1.29.0-wmf.7? [14:54:18] with detached HEA [14:54:21] HEAD? [14:54:21] yes [14:54:31] ok [14:54:34] and you ping ostriches to commit [14:54:47] my git-fu is not strong, I guess [14:55:30] you can rebase as long as the files don't diverge [14:55:48] zeljkof: try: git log --decorate --oneline --graph HEAD...origin/wmf/1.29.0-wmf.7 [14:55:58] that would show: [14:56:08] origin/wmf/1.29.0-wmf.7 (which is what we want [14:56:15] HEAD (what is currently on tin) [14:56:28] (wmf/1.29.0-wmf.7) the local branch [14:56:41] so yeah just rebase HEAD [14:56:58] what Dereckson said really: git rebase HEAD origin/wmf/1.29.0-wmf.7 [14:57:17] which move you on top of origin/wmf/1.29.0-wmf.7 [14:57:38] oh man [14:57:53] is this too complicated, or is it just me?! o.O [14:57:58] ok, so: git rebase HEAD origin/wmf/1.29.0-wmf.7 [14:57:59] complicated [14:58:50] but after a while to merge branches, that will be okay, you'll get an intimate understanding about how Git works [14:59:20] I have thought my git-fu was at least intermediate, but I see there is much to learn... [14:59:42] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949178 (10Marostegui) >>! In T145885#2949174, @Paladox wrote: > @jcrespo hi, apparently you can set mutiple mysqld > > for exam... [14:59:51] oh s*** [15:00:06] zfilipin@tin:/srv/mediawiki-staging/php-1.29.0-wmf.7$ git rebase HEAD origin/wmf/1.29.0-wmf.7 [15:00:12] Cannot rebase: You have unstaged changes. [15:00:16] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949179 (10Paladox) oh [15:00:19] Please commit or stash them. [15:00:28] ahh [15:00:42] there is an uncommitted live hack yeah [15:01:06] Steinsplitter: 53 [15:01:08] let me fix it up [15:01:30] https://phabricator.wikimedia.org/P4758 [15:01:59] don't do that [15:02:09] should I delete that paste? [15:02:23] just realized there are some commits that might not be public [15:02:29] zeljkof: I can't tell really [15:04:12] zeljkof: could you change the visibility settings, so only yourself can see the paste? [15:04:48] Dereckson: zeljkof I did [15:04:52] restricted to just releng [15:05:08] zeljkof: so in short there is a live hack on the cluster [15:05:11] uncomitted [15:05:16] I have crafted a commit [15:05:50] Steinsplitter: 28 remaining [15:06:15] Dereckson: Thx :) [15:06:26] !log extending EU SWAT until 332766 is deployed [15:06:28] zeljkof: will fix the mess [15:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:52] hashar: you will deploy 332766? or will you let me know when I can continue? [15:07:47] Dereckson: should I ping ostriches to do something with 332766, once it is deployed? not sure if I understood you [15:07:59] !log tin.eqiad.wmnet : committed an uncommitted live hack for php-1.29.0-wmf.7/includes/AutoLoader.php by ostriches [15:08:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:20] (03PS2) 10Muehlenhoff: Stick with node 4.6 on maps due to karthotherian not being ready for node 6 [puppet] - 10https://gerrit.wikimedia.org/r/332768 (https://phabricator.wikimedia.org/T149331) [15:08:58] I did the rebase [15:09:23] Dereckson: zeljkof you can now scap pull on terbium [15:09:25] to test maintenance/importImages.php [15:09:29] if there is something tot est [15:09:37] poking chad in a back channel [15:10:18] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949187 (10Paladox) @Marostegui i guess we should do the conversion of db, it will at least stop gerrit making error's. It will j... [15:10:51] hashar: ok, so instead of 'scap pull' at mwdebug1002, I do it at terbium? [15:11:28] yeah [15:11:31] which bring it to prod on that server [15:12:04] ok [15:12:05] so wanna watch logstash for host:terbium ( https://logstash.wikimedia.org/goto/a7c93d80492bfbd2372036e6c8294659 ) [15:15:49] Testing. [15:16:49] Test will start in 3 minutes. [15:17:07] Steinsplitter: all looks good [15:18:26] Dereckson: the patch is at terbium, please test [15:18:26] hashar: so when the patch is tested there, I should still do a full scap deploy, right? [15:18:26] * zeljkof is confused [15:18:34] only of the file [15:18:36] scap sync-file php-1.29.0-wmf.7/maintenance/importImages.php 'maintenance/importImages: Don't sleep after the last upload' [15:18:53] hashar: sure, sorry, "full scap" for the file [15:20:42] Dereckson: thansk again :) [15:20:59] ok, deploying then [15:21:05] Steinsplitter: do you have another rename task? [15:21:25] Steinsplitter: I see on https://phabricator.wikimedia.org/T155185 there is WikiLovesESBot → DisBot [15:21:29] Dereckson: oh, wait, was that "works" for me? :) [15:21:37] zeljkof: yes [15:21:42] ok, then deploying [15:22:05] hi Urbanecm [15:23:48] !log zfilipin@tin Synchronized php-1.29.0-wmf.7/maintenance/importImages.php: SWAT: [[gerrit:332766|maintenance/importImages: Dont sleep after the last upload]] (duration: 00m 41s) [15:23:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:07] Dereckson, hashar: ^ [15:24:13] !log finished EU SWAT [15:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:08] \O/ [15:32:53] 06Operations: Integrate jessie 8.7 point release - https://phabricator.wikimedia.org/T155401#2949235 (10MoritzMuehlenhoff) These are fully rolled out: bash ganeti-instance-debootstrap libio-socket-ssl-perl sed [15:41:29] !log oblivian@puppetmaster1001 conftool action : set/weight=15; selector: service=apache2,cluster=api_appserver,dc=eqiad,name=mw1(1[8-9]|2[0-1]|22[0-5]).* [15:41:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:07] !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw123[0-5].* [15:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:36] !log oblivian@puppetmaster1001 conftool action : set/pooled=inactive; selector: service=nginx,cluster=api_appserver,dc=eqiad,name=mw122[6-9].* [15:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:17] (03PS2) 10Filippo Giunchedi: site: add fluorine's roles to mwlog2001 [puppet] - 10https://gerrit.wikimedia.org/r/332527 (https://phabricator.wikimedia.org/T123728) [15:48:42] <_joe_> we are definitely moving away from the misc naming, heh? [15:48:48] <_joe_> I like mwlog FWIW [15:49:00] <_joe_> easier to remember for newcomers too [15:49:34] yeah worth doing while we're at it [15:51:18] (03CR) 10Filippo Giunchedi: [C: 032] site: add fluorine's roles to mwlog2001 [puppet] - 10https://gerrit.wikimedia.org/r/332527 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [15:55:12] PROBLEM - DPKG on mwlog2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:55:54] (03PS10) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [15:56:12] RECOVERY - DPKG on mwlog2001 is OK: All packages OK [16:03:14] (03PS11) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [16:06:42] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:07:32] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [16:07:35] (03CR) 10Ottomata: Add JVM Heap usage alarms for basic Hadoop daemons (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [16:11:43] (03PS1) 10Yuvipanda: labs: setup instancedumper on californium instead of silver [puppet] - 10https://gerrit.wikimedia.org/r/332775 [16:13:02] (03CR) 10Gehel: Stick with node 4.6 on maps due to karthotherian not being ready for node 6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332768 (https://phabricator.wikimedia.org/T149331) (owner: 10Muehlenhoff) [16:14:54] zeljkof, Dereckson: Rolled back my live-hack from wmf.7 that was committed. That was from some testing I did last week and forgot to abandon [16:15:12] ostriches: cool [16:21:10] (03CR) 10Gehel: Stick with node 4.6 on maps due to karthotherian not being ready for node 6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/332768 (https://phabricator.wikimedia.org/T149331) (owner: 10Muehlenhoff) [16:24:20] ostriches: ack'ed [16:24:37] (03CR) 10Dzahn: [C: 04-1] "Error: Could not find class ::ganglia::packages" [puppet] - 10https://gerrit.wikimedia.org/r/332100 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [16:26:18] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to hive/webrequest data for demon - https://phabricator.wikimedia.org/T155198#2949360 (10demon) Logged in just fine, thanks! [16:29:54] (03PS12) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [16:31:00] (03PS1) 10Filippo Giunchedi: xenon: move to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/332776 (https://phabricator.wikimedia.org/T123728) [16:31:39] (03PS1) 10Hashar: (WIP) run tests against multiple mw versions (WIP) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332777 [16:31:55] (03CR) 10jerkins-bot: [V: 04-1] xenon: move to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/332776 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [16:33:03] (03CR) 10BBlack: [C: 031] Switch cache servers in ulsfo to timesyncd [puppet] - 10https://gerrit.wikimedia.org/r/330865 (https://phabricator.wikimedia.org/T150257) (owner: 10Muehlenhoff) [16:33:24] (03CR) 10jerkins-bot: [V: 04-1] (WIP) run tests against multiple mw versions (WIP) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332777 (owner: 10Hashar) [16:33:52] (03CR) 10BBlack: [C: 031] "4xx is definitely more-correct than 5xx here. The server/service hasn't failed, the client has sent a bad request." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332643 (owner: 10Chad) [16:36:05] (03PS13) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [16:37:19] (03CR) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [16:41:20] jouncebot: next [16:41:20] In 2 hour(s) and 18 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T1900) [16:43:02] (03PS2) 10Filippo Giunchedi: xenon: move to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/332776 (https://phabricator.wikimedia.org/T123728) [16:43:54] (03CR) 10jerkins-bot: [V: 04-1] xenon: move to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/332776 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [16:46:15] (03PS3) 10Filippo Giunchedi: xenon: move to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/332776 (https://phabricator.wikimedia.org/T123728) [16:49:01] (03PS1) 10Filippo Giunchedi: xenon: pass strings to get_tag [puppet] - 10https://gerrit.wikimedia.org/r/332779 [17:04:12] RECOVERY - cassandra-a service on restbase-dev1001 is OK: OK - cassandra-a is active [17:05:14] (03CR) 10Yuvipanda: [C: 032] labs: setup instancedumper on californium instead of silver [puppet] - 10https://gerrit.wikimedia.org/r/332775 (owner: 10Yuvipanda) [17:09:22] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [17:09:32] (03CR) 10Ottomata: [C: 031] kafka: fix Unrecognized escape sequence '\.' [puppet] - 10https://gerrit.wikimedia.org/r/331451 (owner: 10Hashar) [17:10:38] (03PS1) 10Andrew Bogott: Horizon puppettab: display profiles as well as roles [puppet] - 10https://gerrit.wikimedia.org/r/332781 [17:11:09] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949434 (10Paladox) @Marostegui hi, i managed to do mysql_multi, it took a while to setup as i have never done it. but in the end... [17:11:28] !log Silenced shinken, and icinga on labstore1001 for misc nfs migration T154336 [17:11:40] 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2949436 (10Joe) [17:11:43] 06Operations, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, 07Wikimedia-Multiple-active-datacenters: Prepare and improve the datacenter switchover procedure - https://phabricator.wikimedia.org/T154658#2949435 (10Joe) [17:12:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:59] !log Disabling puppet across labs instances with NFS (/home and/or /data/project) mounted for T154336 [17:13:01] T154336: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336 [17:14:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:54] (03PS2) 10Andrew Bogott: Horizon puppettab: display profiles as well as roles [puppet] - 10https://gerrit.wikimedia.org/r/332781 [17:19:18] 06Operations, 10Parsoid, 15User-Joe, 15User-mobrovac: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2949470 (10mobrovac) The request limit in Parsoid is [set to 110s](https://github.com/wikimedia/mediawiki-services-parsoid-deploy/blob/1d75... [17:20:09] (03PS3) 10Andrew Bogott: Horizon puppettab: display profiles as well as roles [puppet] - 10https://gerrit.wikimedia.org/r/332781 [17:20:59] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949474 (10Marostegui) >>! In T145885#2949434, @Paladox wrote: > @Marostegui hi, i managed to do mysql_multi, it took a while to... [17:23:21] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2949483 (10Paladox) oh, sorry i didn't realise that the firewall / puppet and monitoring would need to be changed. Is there any... [19:21:01] !log demon@tin Synchronized dblists/compact-language-links.dblist: New dblist (duration: 00m 39s) [19:21:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:42:33] (03PS5) 10Giuseppe Lavagetto: Generalize entities definitions [software/conftool] - 10https://gerrit.wikimedia.org/r/288609 [19:46:42] hipsterpanda, from beta: PHP fatal error /srv/mediawiki/wmf-config/CommonSettings.php line 173: exception 'Exception' with message 'MWWikiversions::readDbListFile(): unable to read compact-language-links. [19:46:50] I know [19:46:53] It's fixed [19:46:55] On next run [19:47:05] k, thanks [19:47:05] production won't have this issue (order of operations of how I sync'd mattered) [19:47:11] ahh [19:51:46] !log restarted pybal on lvs2006 [19:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:08] !log restarted pybal on lvs2003 [19:56:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:34] (03CR) 10Chad: [C: 032] Moving group1 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332793 (owner: 10Chad) [19:58:58] (03Merged) 10jenkins-bot: Moving group1 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332793 (owner: 10Chad) [19:59:09] (03CR) 10jenkins-bot: Moving group1 to wmf.8 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332793 (owner: 10Chad) [19:59:14] PROBLEM - puppet last run on db1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:00:04] ostriches: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T2000). Please do the needful. [20:01:39] !log demon@tin Started scap: group1 to wmf.8 [20:01:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:14] RECOVERY - Check systemd state on restbase-dev1001 is OK: OK - running: The system is fully operational [20:08:01] I hate the php -> php-* symlink [20:08:04] Hate hate hate [20:08:08] Probably impossible to kill [20:08:09] [20:11:03] 06Operations, 10Pybal, 10Traffic: Unhandled pybal error causing services to be depooled in etcd but not in lvs - https://phabricator.wikimedia.org/T134893#2281050 (10Volans) I encountered a similar issue today, this is the log on when it started: ``` Jan 12 13:19:09 lvs2003 pybal[23011]: [pybal] INFO: [api_... [20:11:54] hipsterpanda: ori and I looked at that a really long time ago. I think we finally decided the only way to figure out what it would break would be to remove it :/ [20:12:37] It shouldn't effect anything in the web flow in theory [20:12:51] it may break some really old static media links [20:13:06] but it could break random scripts/crons [20:14:06] Yeah [20:14:25] "php" isn't super easy to grep for in a PHP codebase :p [20:16:23] /srv/mediawiki(-staging)/php is easy to find [20:16:28] But what about ./php? [20:16:37] Or $some_var . "php" [20:16:38] Or [20:16:39] Or [20:17:02] I'm sure lots of scripts will break [20:17:03] Heh [20:19:40] if we had to keep it around, I'd rename it to php-latest so it's easier to find usages of [20:22:08] 06Operations, 10ops-eqiad: Degraded RAID on ms1001 - https://phabricator.wikimedia.org/T152367#2950379 (10Volans) @Cmjohnson sorry, I'm not the right person to ask. I've just put the correct output of the script given that it was timing out through NRPE. [20:27:14] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [20:28:47] !log nuria@tin Starting deploy [analytics/refinery@666d98d]: (no message) [20:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:07] !log nuria@tin Finished deploy [analytics/refinery@666d98d]: (no message) (duration: 02m 19s) [20:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:54] PROBLEM - cassandra CQL 10.64.48.46:9042 on restbase-dev1003 is CRITICAL: connect to address 10.64.48.46 and port 9042: Connection refused [20:38:54] PROBLEM - Restbase root url on restbase-dev1003 is CRITICAL: connect to address 10.64.48.46 and port 7231: Connection refused [20:39:14] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:39:14] PROBLEM - cassandra service on restbase-dev1003 is CRITICAL: CRITICAL - Unit cassandra is active but reported SubState exited, wanted running [20:39:14] PROBLEM - restbase endpoints health on restbase-dev1003 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.48.46, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by NewConnectionError(urllib3.connection.HTTPConnection object at 0x7fa3b493c950: Failed to establish a new connection: [Errno 111] Connection refused,)) [20:39:14] PROBLEM - cassandra SSL 10.64.48.46:7001 on restbase-dev1003 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [20:39:58] urandom: is it you? [20:40:47] volans: sort of [20:40:51] volans: maybe [20:40:59] volans: probably [20:41:18] volans: yes. [20:41:51] lol [20:42:22] volans: tl;dr, what changed is that the scheduled downtime in icinga came and went :) [20:43:37] eheheh classic! [20:47:56] !log Upgraded nodejs to v6 on wtp1001 T149331 [20:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:00] T149331: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331 [20:48:12] !log demon@tin Finished scap: group1 to wmf.8 (duration: 46m 33s) [20:48:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:14] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:58:51] (03PS1) 10Eevans: restbase-dev: rack assignment [puppet] - 10https://gerrit.wikimedia.org/r/332823 (https://phabricator.wikimedia.org/T153880) [21:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170118T2100). Please do the needful. [21:00:50] !log restarted pybal on lvs2004 (passive) T134893 [21:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:55] T134893: Unhandled pybal error causing services to be depooled in etcd but not in lvs - https://phabricator.wikimedia.org/T134893 [21:08:31] (03CR) 10Ottomata: [C: 031] "Haven't read all the files, but +1 to the idea" [puppet] - 10https://gerrit.wikimedia.org/r/332106 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [21:09:10] (03CR) 10Ottomata: [C: 031] Introduce linters using rake [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/331328 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [21:09:14] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:09:24] (03CR) 10Ottomata: [C: 031] Introduce linters using rake [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/331327 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [21:09:31] hi [21:09:38] (03CR) 10Ottomata: [C: 031] Introduce linters using rake [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/331330 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [21:09:40] (03CR) 10Ottomata: [C: 031] Introduce linters using rake [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/331332 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [21:10:35] 06Operations, 10Parsoid, 15User-Joe, 15User-mobrovac: Parsoid timing out or failing when trying to parse specific user page - https://phabricator.wikimedia.org/T155618#2948681 (10ssastry) The PHP parser also gives up with lots of errors like this on the page: ``` ... S08W039 Lua error: too many expensive... [21:12:59] (03PS2) 10Eevans: restbase-dev: rack assignment [puppet] - 10https://gerrit.wikimedia.org/r/332823 (https://phabricator.wikimedia.org/T153880) [21:17:54] jouncebot: next [21:17:54] In 2 hour(s) and 42 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170119T0000) [21:18:53] !log restarted pybal on lvs2001 (active) T134893 [21:19:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:00] T134893: Unhandled pybal error causing services to be depooled in etcd but not in lvs - https://phabricator.wikimedia.org/T134893 [21:19:16] (03PS1) 10Ottomata: Fix 'invalid byte sequence in US-ASCII' on non Jessie passenger puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/332853 [21:20:02] hipsterpanda: any objections to me doing an update for wikimania-scholarships? We've got a few text strings and the addition of a missing language community for the actual application form [21:20:10] No objections [21:20:20] cool beans [21:20:22] ottomata: dpkg -s says you were the last person to package git-fat. This sound familiar? [21:20:43] hipsterpanda: that sounds correct! [21:20:46] been a while, but ya [21:21:16] So, I've got 2 patches (and thcipriani has been noodling a third) that would be nice to land. Would you be ok with packaging once those land? [21:21:34] anomie: can you please look at : https://phabricator.wikimedia.org/T155668 ? [21:21:47] Trying to solve the "sometimes fat files end up as plaintext refs and not actual blobs of fat" [21:22:27] matanya: I see hipsterpanda and addshore are already discussing that one in #mediawiki-core [21:22:50] matanya: Yep, known...patch in progress [21:22:55] Spotted a bit ago [21:23:02] FWIW the long explanation of why git-fat doesn't work sometimes is here: https://phabricator.wikimedia.org/T147856#2885665 [21:23:50] tl;dr: git tries to be helpful and store stat in an index which causes git-filter (smudge and clean) to sometimes not work. [21:24:19] thanks anomie [21:24:21] hipsterpanda: i am having some cognitive dissonance talking with your alternate identity in another channel [21:24:24] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[zotero/translators],Package[zotero/translation-server],Exec[chown /srv/deployment/zotero for deploy-service] [21:24:26] but +1 to whatever you need to do to git-fat [21:24:27] :) [21:24:35] Thanks, heh [21:24:42] Ok, back to normal [21:25:14] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:25:25] ostriches: https://phabricator.wikimedia.org/T155667 is the same case then ? [21:25:36] Yes [21:25:40] The LQT is already fixed [21:25:44] Ostriches do you need help with the wikibase thing [21:25:55] ohia audephone ! :D [21:25:56] Idk maybe addshore was looking [21:25:57] addshore is already helping me [21:26:00] Thx tho [21:26:01] Cool [21:26:12] yeh, feel free to leave it to me :) [21:26:21] Thanks :) [21:26:23] (03CR) 10Ottomata: [C: 04-1] "_joe_ says this won't work:" [puppet] - 10https://gerrit.wikimedia.org/r/332853 (owner: 10Ottomata) [21:26:27] ostriches: so please close at some point :) [21:26:38] If there's any other issues I could help once I get home [21:28:09] matanya: I fixed it before you filed a task :p [21:28:24] (03CR) 10Volans: [C: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/332823 (https://phabricator.wikimedia.org/T153880) (owner: 10Eevans) [21:28:25] ostriches: faster than light! [21:29:08] More worrying is the OOMs I'm seeing in group0 and group1 [21:29:14] Not a ton, but group2 scares me with OOMs [21:32:17] !log Updated wikimania-scholarships to 29ba0ec "Add Tulu (tcy) to Communities" (T155666) [21:32:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:22] T155666: Add Tulu (tcy) to "Primary language community on wiki" list - https://phabricator.wikimedia.org/T155666 [21:33:24] (03CR) 10Chad: [C: 032] Pull in all upstream changes from https://github.com/jedbrown/git-fat/blob/master/git-fat [debs/git-fat] - 10https://gerrit.wikimedia.org/r/330464 (owner: 10Chad) [21:35:01] (03CR) 10Chad: [C: 032] First iteration at making git-fat somewhat legible [debs/git-fat] - 10https://gerrit.wikimedia.org/r/330454 (owner: 10Chad) [21:35:14] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [21:35:14] RECOVERY - cassandra-a SSL 10.64.0.36:7001 on restbase-dev1001 is OK: SSL OK - Certificate restbase-dev1001-a valid until 2018-01-05 22:53:02 +0000 (expires in 352 days) [21:36:11] matanya: that wikimania-scholarships update should have your text tweaks in it too [21:36:25] it is there, thanks bd808 [21:37:14] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:38:14] PROBLEM - Check systemd state on restbase-test1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:40:14] RECOVERY - Check systemd state on restbase-test1001 is OK: OK - running: The system is fully operational [21:42:12] ostriches: looks like I might have to look into why the Scribunto tests on jenkins for Wikibase are failing first... :/ [21:42:24] RECOVERY - cassandra-a CQL 10.64.0.36:9042 on restbase-dev1001 is OK: TCP OK - 0.000 second response time on 10.64.0.36 port 9042 [21:43:35] (03CR) 10Ottomata: [C: 031] udp2log: move to service_unit and systemd [puppet] - 10https://gerrit.wikimedia.org/r/313604 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [21:45:52] no mobileapps deploy today [21:49:52] (03PS1) 10Eevans: Enable instance restbase-dev1001-b.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/332876 (https://phabricator.wikimedia.org/T153880) [21:52:24] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:07:11] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth: Rename user TextworkerBot to VladiBot on ru.wiki - https://phabricator.wikimedia.org/T153602#2950679 (10Legoktm) 05stalled>03declined Thanks for understanding. [22:10:24] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit nfs-exportd is inactive [22:10:37] ^ got it [22:13:41] 06Operations, 06Operations-Software-Development, 10Pybal, 10Traffic: Unhandled pybal error causing services to be depooled in etcd but not in lvs - https://phabricator.wikimedia.org/T134893#2950697 (10Volans) [22:20:24] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:46:19] !log Reenabled nfs-exportd and puppet on labstore1004. All of misc being exported as rw now. T154336 [22:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:26] T154336: Migrate misc to secondary labstore HA cluster - https://phabricator.wikimedia.org/T154336 [22:48:50] !log demon@tin Synchronized php-1.29.0-wmf.8/includes/widget/search/FullSearchResultWidget.php: Unbreak hook mess (duration: 00m 45s) [22:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:24] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:49:48] !log demon@tin Synchronized php-1.29.0-wmf.8/extensions/LiquidThreads/classes/Hooks.php: Unbreak hook mess (duration: 00m 41s) [22:49:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:43] 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to metawiki - https://phabricator.wikimedia.org/T150943#2950942 (10Addshore) [22:56:53] 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to dewiki - https://phabricator.wikimedia.org/T150942#2950943 (10Addshore) [22:57:00] (03PS2) 10Eevans: Enable remaining restbase-dev* instances [puppet] - 10https://gerrit.wikimedia.org/r/332876 (https://phabricator.wikimedia.org/T153880) [22:57:32] 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, 07User-notice: Deploy ElectronPdfService Extension to production - https://phabricator.wikimedia.org/T150185#2950944 (10Addshore) [22:58:28] (03CR) 10Eevans: [C: 04-1] "Not quite ready to merge..." [puppet] - 10https://gerrit.wikimedia.org/r/332876 (https://phabricator.wikimedia.org/T153880) (owner: 10Eevans) [22:58:46] (03PS2) 10Addshore: Enable ElectronPdfService extension on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324488 (https://phabricator.wikimedia.org/T150943) [22:59:13] (03PS2) 10Addshore: Enable ElectronPdfService extension on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324489 (https://phabricator.wikimedia.org/T150942) [22:59:51] (03PS3) 10Addshore: Enable ElectronPdfService extension on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324488 (https://phabricator.wikimedia.org/T150943) [22:59:56] !log demon@tin Synchronized php-1.29.0-wmf.8/extensions/ProofreadPage/includes/index/ProofreadIndexPage.php: Unbreak, T155682 (duration: 00m 39s) [23:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:04] T155682: Unable do create any page in Index namespace of wikisources - https://phabricator.wikimedia.org/T155682 [23:00:08] (03PS3) 10Addshore: Enable ElectronPdfService extension on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324489 (https://phabricator.wikimedia.org/T150942) [23:05:03] 06Operations: Rename the TC team mailing list - https://phabricator.wikimedia.org/T155683#2950970 (10Quiddity) [23:09:08] (03PS1) 10Andrew Bogott: Keystone hooks: Set up default security groups for new projects. [puppet] - 10https://gerrit.wikimedia.org/r/332899 (https://phabricator.wikimedia.org/T136871) [23:10:37] has something funky happened to labs? All my instances appear to be broken and my home directory seems to have been wiped? [23:12:42] jdlrobson: there's NFS migration going on, see the labs-l list [23:13:08] legoktm: ahh that explains things. [23:13:24] https://lists.wikimedia.org/pipermail/labs-l/2017-January/004849.html [23:18:12] jdlrobson, yeah, what servers specifically but it's in teh process of correcting [23:18:32] unfortunately some side effect killed /home contents on a subset of servers and we are restoring at the moment [23:18:54] several things - i hadnt clicked that my instances would be impacted by the migration [23:19:02] (03PS2) 10Volans: Add missing comment for some Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710 [23:19:38] jdlrobson, they were not meant to be man, it's no bueno but hopefully short lived issue [23:19:49] it was just i could ssh in and was a bit alarmed to see git repos wiped / folders and files missing [23:20:02] so started panicking [23:20:06] (03CR) 10jerkins-bot: [V: 04-1] Keystone hooks: Set up default security groups for new projects. [puppet] - 10https://gerrit.wikimedia.org/r/332899 (https://phabricator.wikimedia.org/T136871) (owner: 10Andrew Bogott) [23:20:09] what instance? [23:20:26] a welcome message would be cool [23:20:39] pushipedia.eqiad.wmflabs [23:20:54] nomad.eqiad.wmflabs [23:20:57] trending.eqiad.wmflabs [23:21:06] ostriches, looks like https://gerrit.wikimedia.org/r/#/c/304206/ has killed Collection [23:21:34] Then revert [23:21:59] jdlrobson, you're right totally, it was an err that has been a scramble to unwind [23:23:09] chasemp: thanks for doing the unwinding :) mistakes happen! [23:23:32] seems like pushipedia is mid recovery, lot of stuff there [23:24:53] ostriches, https://gerrit.wikimedia.org/r/#/c/332902/ [23:33:26] MaxSem: Cherry-picking to wmf.8 [23:33:30] (03PS1) 10Addshore: Add twocolconflict to wgBetaFeaturesWhitelist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332904 (https://phabricator.wikimedia.org/T150184) [23:33:37] thx [23:37:11] (03PS2) 10Andrew Bogott: Keystone hooks: Set up default security groups for new projects. [puppet] - 10https://gerrit.wikimedia.org/r/332899 (https://phabricator.wikimedia.org/T136871) [23:38:01] !log demon@tin Synchronized php-1.29.0-wmf.8/extensions/Collection: Unbreak (duration: 00m 40s) [23:38:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:14] MaxSem: ^ [23:44:04] (03PS1) 10Addshore: Enable TwoColConflict on test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332908 [23:44:06] (03PS1) 10Addshore: Enable TwoColConflict on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332909 [23:44:08] (03PS1) 10Addshore: Enable TwoColConflict on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332910 [23:44:10] (03PS1) 10Addshore: Enable TwoColConflict on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/332911 [23:44:40] ostriches, confirm that it's fixed on testwiki [23:51:57] @seen urandom [23:51:57] mutante: I have never seen urandom [23:53:07] mutante: I think is in a meeting ;) [23:53:36] volans: thanks [23:58:36] (03PS3) 10Volans: Add missing comment for some Ganeti instances [dns] - 10https://gerrit.wikimedia.org/r/332710