[00:44:28] (03PS1) 10BryanDavis: wmcs: Make /usr/local/bin/log-command-invocation a no-op [puppet] - 10https://gerrit.wikimedia.org/r/381712 (https://phabricator.wikimedia.org/T166712) [01:14:33] (03PS1) 10BryanDavis: Remove Schema:CommandInvocation EventLogging [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381713 (https://phabricator.wikimedia.org/T166712) [02:36:54] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.1) (duration: 08m 42s) [02:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:43:34] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Oct 2 02:43:33 UTC 2017 (duration 6m 39s) [02:43:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:22:33] (03Draft2) 10Jayprakash12345: Enable ShortUrl Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 [03:23:51] (03PS3) 10Jayprakash12345: Enable ShortUrl Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 (https://phabricator.wikimedia.org/T177187) [03:28:15] (03Draft2) 10Jayprakash12345: Enable NewUserMessage Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381715 [03:28:53] (03PS3) 10Jayprakash12345: Enable NewUserMessage Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381715 (https://phabricator.wikimedia.org/T177188) [04:00:25] (03CR) 10Zoranzoki21: [C: 031] Enable ShortUrl Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 (https://phabricator.wikimedia.org/T177187) (owner: 10Jayprakash12345) [04:01:54] (03CR) 10Zoranzoki21: [C: 031] Enable NewUserMessage Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381715 (https://phabricator.wikimedia.org/T177188) (owner: 10Jayprakash12345) [04:16:50] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1945 bytes in 0.138 second response time [04:41:49] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1936 bytes in 0.114 second response time [05:03:14] (03PS1) 10Zoranzoki21: Enable Sandbox menu to Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) [05:04:05] (03PS2) 10Zoranzoki21: Enable Sandbox menu to Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) [05:46:49] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [05:47:00] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [05:52:17] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649781 (10Marostegui) [05:52:51] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649267 (10Marostegui) a:03Cmjohnson Hi @Cmjohnson please change this disk whenver you can Thanks! [05:53:18] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3649267 (10Marostegui) p:05Triage>03Normal [05:55:09] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:55:50] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:02:02] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db1036 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381233 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [06:02:05] (03PS2) 10Marostegui: db-eqiad,db-codfw.php: Remove db1036 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381233 (https://phabricator.wikimedia.org/T176311) [06:06:12] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1036 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381233 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [06:07:07] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db1036 from config files as it will be decommissioned - T176311 (duration: 00m 48s) [06:07:10] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:12] T176311: decommission db1036 - https://phabricator.wikimedia.org/T176311 [06:07:33] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3649832 (10Marostegui) [06:07:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [06:08:10] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db1036 from config files as it will be decommissioned - T176311 (duration: 00m 46s) [06:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:10:32] !log restart varnish backend on cp3033 [06:10:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:35] the other one seems cp3040 - https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&from=now-1h&to=now&var-datasource=esams%20prometheus%2Fops&var-cache_type=text [06:14:50] (03PS1) 10Marostegui: mariadb: Remove db1036 for decommissioning [puppet] - 10https://gerrit.wikimedia.org/r/381721 (https://phabricator.wikimedia.org/T176311) [06:17:40] (03CR) 10Marostegui: [C: 032] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8128/" [puppet] - 10https://gerrit.wikimedia.org/r/381721 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [06:18:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:18:59] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:19:10] !log restart varnish backend on cp3040 [06:19:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:19:43] (03PS1) 10Marostegui: site.pp: Remove blank space [puppet] - 10https://gerrit.wikimedia.org/r/381722 [06:20:32] (03CR) 10Marostegui: [C: 032] site.pp: Remove blank space [puppet] - 10https://gerrit.wikimedia.org/r/381722 (owner: 10Marostegui) [06:21:47] both hosts looks healthy [06:27:59] PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [06:28:59] RECOVERY - graphite.wikimedia.org on graphite1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.011 second response time [06:30:17] (03CR) 10WMDE-leszek: [C: 031] "I might be missing some context, but Aude's suggestion seems reasonable if there is an intention to have the same setting on both labs and" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [06:34:37] cp3031 returned a few 503s, nothing big enough to trigger alarms [06:39:00] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3649839 (10Marostegui) [06:39:47] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10Marostegui) a:05Marostegui>03Cmjohnson db1036 is now ready to be totally decommissioned by @Cmjohnson [06:55:08] <_joe_> !log restarting a few API appservers with high cpu load [06:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:58:40] (03PS1) 10Marostegui: db-codfw.php: Depool db2034, db2055 and db2062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381732 (https://phabricator.wikimedia.org/T174509) [07:02:52] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2034, db2055 and db2062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381732 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:05:07] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2034, db2055 and db2062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381732 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:06:18] (03CR) 10jenkins-bot: db-codfw.php: Depool db2034, db2055 and db2062 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381732 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:06:30] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2034, db2055 db2062 - T174509 (duration: 00m 46s) [07:06:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:37] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:10:00] !log Optimize table pagelinks and templatelinks on s1: db2034, db2055 db2062 - T174509 [07:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:16:17] (03PS1) 10Marostegui: db-codfw.php: Depool db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381734 (https://phabricator.wikimedia.org/T174509) [07:18:05] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381734 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:20:20] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381734 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:20:31] (03CR) 10jenkins-bot: db-codfw.php: Depool db2039 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381734 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [07:20:53] !log Optimize table pagelinks and templatelinks on s6: db2039 - T174509 [07:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:59] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:21:21] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2039 - T174509 (duration: 00m 46s) [07:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:49] !log Optimize table pagelinks and templatelinks on s6: dbstore2001 - T174509 [07:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:54] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:38:08] (03CR) 10Hashar: [C: 04-1] "Yesterday I went ahead and started change to convert to the role/profile/module pattern https://gerrit.wikimedia.org/r/#/q/topic:ciprofi" [puppet] - 10https://gerrit.wikimedia.org/r/379729 (owner: 10Hashar) [07:40:21] (03PS2) 10Ladsgroup: labs: Use redis lock manager for dispatching changes of Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) [07:41:15] (03PS1) 10Marostegui: s2.hosts: Remove db1036 [software] - 10https://gerrit.wikimedia.org/r/381736 (https://phabricator.wikimedia.org/T176311) [07:41:53] !log Stop MySQL on db1036 as it is going to be decommissioned - https://phabricator.wikimedia.org/T176311 [07:41:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:29] (03CR) 10Marostegui: [C: 032] s2.hosts: Remove db1036 [software] - 10https://gerrit.wikimedia.org/r/381736 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [07:43:16] (03Merged) 10jenkins-bot: s2.hosts: Remove db1036 [software] - 10https://gerrit.wikimedia.org/r/381736 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [07:44:07] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3649884 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1310.eqiad.wmnet', 'mw1311.eqiad.wmnet... [07:44:23] started the reimage of mw1310.eqiad.wmnet mw1311.eqiad.wmnet, two new jobrunners [07:44:37] (03CR) 10Ladsgroup: "Done" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [07:47:51] (03PS2) 10Marostegui: mediawiki: Maintenance script to clean up duplicates in wb_terms [puppet] - 10https://gerrit.wikimedia.org/r/381433 (https://phabricator.wikimedia.org/T163551) (owner: 10Ladsgroup) [07:51:12] (03CR) 10Marostegui: [C: 032] mediawiki: Maintenance script to clean up duplicates in wb_terms [puppet] - 10https://gerrit.wikimedia.org/r/381433 (https://phabricator.wikimedia.org/T163551) (owner: 10Ladsgroup) [07:52:45] (03CR) 10Hashar: "Puppet compiler https://puppet-compiler.wmflabs.org/compiler02/8129/contint1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/381646 (owner: 10Hashar) [07:55:28] (03CR) 10Hashar: [C: 04-1] "I dont think that one is right from https://puppet-compiler.wmflabs.org/compiler02/8130/ . That drops eg Nrpe::Monitor_service[zuul_gearma" [puppet] - 10https://gerrit.wikimedia.org/r/381647 (owner: 10Hashar) [07:57:22] !log starting a round of cleanup in ores_classification table in wikidatawiki [07:57:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:06] !log Drop now redundant indexes from pagelinks and templatelinks from s5 - T174509 [07:59:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:59:12] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [07:59:15] (03PS2) 10Hashar: contint: move an include from site.pp to role [puppet] - 10https://gerrit.wikimedia.org/r/381648 [08:01:14] (03CR) 10Hashar: "Noop in puppet compiler https://puppet-compiler.wmflabs.org/compiler02/8131/" [puppet] - 10https://gerrit.wikimedia.org/r/381648 (owner: 10Hashar) [08:02:54] (03PS5) 10Gehel: camus - switch to logrotate::rule [puppet] - 10https://gerrit.wikimedia.org/r/373516 [08:04:35] (03CR) 10Gehel: [C: 032] camus - switch to logrotate::rule [puppet] - 10https://gerrit.wikimedia.org/r/373516 (owner: 10Gehel) [08:12:40] !log Optimize tables pagelinks and templatelinks on s5: db1100 (already depooled) - T174509 [08:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:12:45] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [08:22:41] (03PS1) 10Giuseppe Lavagetto: Add rubocop validation of the main file, check_commit task [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381738 [08:24:04] !log stopping dbstore1001 for maintenance [08:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:39] !log finished the cleanup of ores_classification table in wikidatawiki and starting the enwiki one (T159753) [08:29:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:45] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [08:32:19] 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3649967 (10fgiunchedi) >>! In T173056#3648041, @LilyOfTheWest wrote: > @fgiunchedi a quick note that Multichill and I did some assessments of the number of photos w... [08:34:52] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 1.6 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/380747 (https://phabricator.wikimedia.org/T172556) (owner: 10Gilles) [08:35:34] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add rubocop validation of the main file, check_commit task [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381738 (owner: 10Giuseppe Lavagetto) [08:40:44] !log Stop MySQL on db2044 to get it ready to replace its mainboard - T174764 [08:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:51] T174764: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764 [08:46:30] !log Poweroff db2044 for HW maintenance - T174764 [08:46:31] !log gehel@tin Started deploy [tilerator/deploy@b02bd9a]: (no justification provided) [08:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:36] T174764: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764 [08:46:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:53] !log gehel@tin Finished deploy [tilerator/deploy@b02bd9a]: (no justification provided) (duration: 00m 22s) [08:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:14] !log deploying latest tilerator on maps-test only, fix for T175123 [08:47:15] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3649981 (10Marostegui) @Papaul server is now off. Feel free to power it on once you are done with it Thank you! [08:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:19] T175123: tileshell does not honor redis configuration in /etc/tilerator/config.yaml - https://phabricator.wikimedia.org/T175123 [08:51:29] (03PS2) 10Hashar: Convert zuul::server to a profile [puppet] - 10https://gerrit.wikimedia.org/r/381647 [08:52:13] (03CR) 10Hashar: "Added ::profile::zuul::server to role::ci::master." [puppet] - 10https://gerrit.wikimedia.org/r/381647 (owner: 10Hashar) [08:53:37] (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler02/8132/ better!" [puppet] - 10https://gerrit.wikimedia.org/r/381647 (owner: 10Hashar) [08:53:41] (03PS1) 10Filippo Giunchedi: debian: add python-pyexiv2 dependency [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/381741 [08:57:56] PROBLEM - Disk space on notebook1002 is CRITICAL: DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error [08:58:49] checking --^ [09:00:56] RECOVERY - Disk space on notebook1002 is OK: DISK OK [09:06:04] !log forced remount of /mnt/hdfs on notebook1002 [09:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:05] PROBLEM - DPKG on thumbor2004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:11:11] that's me ^ [09:11:27] should be recovering by itself [09:12:05] RECOVERY - DPKG on thumbor2004 is OK: All packages OK [09:12:14] !log roll-restart thumbor to upgrade to 1.6 - T172556 [09:12:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:21] T172556: Images with embedded thumbnail whose orientation (rotation) is different than that of the main image render thumbnails wrong (smaller than they should be) - https://phabricator.wikimedia.org/T172556 [09:12:46] (03CR) 10Gilles: "It's already a dependency of the thumbor package: https://github.com/gi11es/thumbor-debian/blob/master/thumbor/thumbor-6.3.3rc%2Bgit201708" [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/381741 (owner: 10Filippo Giunchedi) [09:13:33] (03CR) 10Gilles: [C: 032] "Ah, but it's only a build dependency there, not a runtime one. Doh..." [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/381741 (owner: 10Filippo Giunchedi) [09:16:19] (03PS1) 10Hashar: Add .gitreview file [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381743 [09:16:21] (03PS1) 10Hashar: Fix rubocop and add 'rake test' for CI [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381744 [09:18:38] (03CR) 10Hashar: "Will let me turn CI on via: https://gerrit.wikimedia.org/r/#/c/381745/" [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381744 (owner: 10Hashar) [09:27:15] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3650044 (10ovasileva) [09:27:48] !log finished the cleanup of ores_classification table in enwiki (T159753) [09:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:54] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [09:28:31] aaa [09:29:39] triple-a rated comment right there [09:34:58] 10Operations, 10Goal: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195#3650125 (10fgiunchedi) [09:37:45] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [5000.0] [09:37:54] 10Operations, 10Goal: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3650139 (10fgiunchedi) [09:39:43] 10Operations, 10Goal: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#3650154 (10fgiunchedi) [09:41:54] pretty sure those errors are the new jobrunners [09:42:23] ok, I was checking db traffic, because normally when there is a real memcache problem db traffic increases a lot [09:42:31] it is not the case [09:42:52] confirmed, those are all due to the new jobrunners [09:42:57] weird I thought I fixed the issue [09:43:27] they seem all nutracker related [09:43:29] 10Operations, 10Goal: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3650187 (10fgiunchedi) [09:43:58] probably there is still a missing dependency [09:44:38] Warning: unable to connect to unix:///var/run/nutcracker/redis_eqiad.sock [09:45:18] check https://logstash.wikimedia.org/goto/980d05ae91e4be29c5f8aa404d43ccd9 [09:48:05] yep, I saw those, I am pretty sure that nutcrackers comes up before fully configured [09:48:08] or something similar [09:48:25] all the appservers do not complete the first puppet run cleanly [09:49:29] maybe it is only a matter of the jobrunner/chron daemons starting their work too soon? [09:55:38] !log stopping and upgrading labsdb1010 [09:55:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:53] a couple of proxies will complain about labsdb1010, ignore them [09:57:22] (03CR) 10Filippo Giunchedi: Prometheus based Kafka broker alerts, take 1 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) (owner: 10Ottomata) [09:59:15] PROBLEM - haproxy failover on dbproxy1011 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [09:59:24] that is one [09:59:25] PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [09:59:29] and that is two [10:01:05] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3650213 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1310.eqiad.wmnet', 'mw1311.eqiad.wmnet'] ``` and were **ALL** successful. [10:01:22] jobrunners running now, fatals down to zero [10:02:56] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] [10:04:49] !log restbase restarting on restbase1007 to start logging error bodies for T176728 [10:04:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:55] T176728: Parsoid workers die transforming Main_Page - https://phabricator.wikimedia.org/T176728 [10:05:57] Pchelolo: ^ [10:06:06] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1310.eqiad.wmnet [10:06:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:11] 10Operations, 10monitoring, 10User-fgiunchedi: Upgrade grafana to 4.5.2 - https://phabricator.wikimedia.org/T175980#3650222 (10fgiunchedi) [10:06:11] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1311.eqiad.wmnet [10:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:16] RECOVERY - haproxy failover on dbproxy1011 is OK: OK check_failover servers up 2 down 0 [10:15:26] (03PS11) 10Filippo Giunchedi: base: ability to send syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/378922 (https://phabricator.wikimedia.org/T136312) [10:15:26] RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover servers up 2 down 0 [10:16:18] (03CR) 10Filippo Giunchedi: base: ability to send syslog over TLS (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378922 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [10:16:35] (03PS12) 10Filippo Giunchedi: base: ability to send syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/378922 (https://phabricator.wikimedia.org/T136312) [10:17:48] (03CR) 10Filippo Giunchedi: [C: 032] base: ability to send syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/378922 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [10:19:53] (03PS2) 10Volans: wmf-auto-reimage: add support for rename [puppet] - 10https://gerrit.wikimedia.org/r/381200 (https://phabricator.wikimedia.org/T176955) [10:28:20] (03PS1) 10Filippo Giunchedi: hieradata: use syslog-tls (6514) port not 6154 [puppet] - 10https://gerrit.wikimedia.org/r/381750 [10:29:15] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: use syslog-tls (6514) port not 6154 [puppet] - 10https://gerrit.wikimedia.org/r/381750 (owner: 10Filippo Giunchedi) [10:45:58] (03PS25) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 (https://phabricator.wikimedia.org/T157414) [11:23:33] (03PS1) 10Filippo Giunchedi: base: create /etc/rsyslog as required [puppet] - 10https://gerrit.wikimedia.org/r/381753 (https://phabricator.wikimedia.org/T136312) [11:34:24] 10Operations, 10DBA, 10procurement: Purchase sanitarium & backup tests hosts (4 hosts in total) - https://phabricator.wikimedia.org/T177203#3650318 (10Marostegui) [11:45:26] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2039" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381756 [11:48:22] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2039" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381756 (owner: 10Marostegui) [11:49:56] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2039" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381756 (owner: 10Marostegui) [11:50:09] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2039" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381756 (owner: 10Marostegui) [11:51:01] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2039 - T174509 (duration: 00m 46s) [11:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:07] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [11:59:50] (03CR) 10Filippo Giunchedi: [C: 032] base: create /etc/rsyslog as required [puppet] - 10https://gerrit.wikimedia.org/r/381753 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [12:13:37] 10Operations, 10DBA: Increase timeout for mariadb replication check - https://phabricator.wikimedia.org/T163303#3650383 (10Marostegui) 05Open>03declined With the full reimplementation of the backups/dbstore hosts, let's decline this. [12:29:06] 10Operations, 10ops-eqiad: labsdb1001: Investigate eth0 wrong negotiated interface speed - https://phabricator.wikimedia.org/T137555#3650414 (10Marostegui) 05stalled>03declined Let's close this as these servers will be decommissioned (hopefully 13th Dec) - T175086#3633776 [12:29:53] 10Operations, 10ops-eqiad, 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3650418 (10Marostegui) [12:43:36] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3650457 (10Marostegui) [12:43:42] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3650469 (10Marostegui) p:05Triage>03Normal [12:44:49] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3650457 (10Marostegui) [12:45:23] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3650457 (10Marostegui) [12:46:11] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3650487 (10Marostegui) [12:48:25] (03PS1) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 [12:48:53] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 (owner: 10ArielGlenn) [12:50:22] !log Optimize tables pagelinks and templatelinks on s5: dbstore2001  [12:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:09] (03PS2) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 [12:54:36] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 (owner: 10ArielGlenn) [12:56:19] !log added two new mediawiki jobrunners - mw131[01] [12:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:53] (03PS3) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 [12:57:59] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 (owner: 10ArielGlenn) [12:58:58] jouncebot: next [12:58:58] In 0 hour(s) and 1 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T1300) [12:59:46] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3650507 (10chasemp) Final status of etherpad we were using to coordinate migrations off labvirt1015 for posterity ```https://phabricator.wikimedia.org/T177164 https://phabri... [13:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the European Mid-day SWAT(Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T1300). [13:00:05] Amir1, hoo, Jayprakash12345, Framawiki, and Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:21] I can SWAT today [13:01:00] hoo: around for swat? I would like to start with your patch, since it will probably take long to merge [13:01:23] (03PS4) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 [13:01:28] it's just configuration, should be blazingly fast :) [13:01:29] o/ [13:01:52] hoo: oh, just noticed that, I thought it would be some crazy wikidata deploy ;) [13:02:16] ok, in that case, will deploy in the order on the wiki page [13:02:31] Amir1: your patches are first, will ping you once they are at mwdebug1002 [13:02:47] (03CR) 10Rush: "How would you feel about keeping the barest of this ideas w/ logging the commands run to syslog?" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381713 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [13:02:55] zeljkof: the first one is labs only [13:03:03] (beta cluster_ [13:03:23] Amir1: so I just merge it? or I still need to deploy? [13:04:09] zeljkof: just merge [13:04:14] zeljkof: that does not impact production [13:04:24] so +2 and pull on deployment server :) [13:04:31] also deploy for the the sake of sync [13:04:46] ok, so a normal deploy then :) [13:05:06] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [13:06:17] but without mwdebug I guess :d [13:07:49] (03Merged) 10jenkins-bot: labs: Use redis lock manager for dispatching changes of Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [13:08:00] (03CR) 10jenkins-bot: labs: Use redis lock manager for dispatching changes of Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381529 (https://phabricator.wikimedia.org/T175109) (owner: 10Ladsgroup) [13:08:02] (03PS4) 10Zfilipin: Add 'eliminator' as a priviliged account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [13:08:57] 10Operations, 10ops-eqiad, 10Cloud-Services, 10netops: labsdb1001's switch port negociating at 100M - https://phabricator.wikimedia.org/T177130#3650565 (10faidon) [13:09:00] 10Operations, 10ops-eqiad: labsdb1001: Investigate eth0 wrong negotiated interface speed - https://phabricator.wikimedia.org/T137555#3650568 (10faidon) [13:09:16] !log zfilipin@tin Synchronized wmf-config/Wikibase-labs.php: wmf-config/Wikibase.php wmf-config/Wikibase-production.php SWAT: [[gerrit:381529|labs: Use redis lock manager for dispatching changes of Wikibase (T175109)]] (duration: 00m 47s) [13:09:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:21] T175109: beta: Wikidata dispatchChanges.php causes a lot of "Wikimedia\Rdbms\DatabaseMysqlBase::unlock failed to release lock" - https://phabricator.wikimedia.org/T175109 [13:09:34] Amir1: 381529 is deployed, please check [13:10:01] Thanks I will keep an eye on logstash [13:10:28] 10Operations, 10ops-eqiad: adjust flerovium power draw - https://phabricator.wikimedia.org/T177131#3650573 (10faidon) Note that the storage shelves are only there temporarily, for 2-3 weeks. I'll leave the decision on whether to balance power in the meantime to you guys though, you know best :) [13:11:47] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [13:13:25] hoo: please stand by, your patch is next, will ping you in a few minutes, as soon as it is at mwdebug1002 [13:13:41] You can skip that [13:13:47] there's hardly anything to verify [13:13:59] it's turning a not-yet deployed feature off basically [13:14:00] hoo: so a full deploy [13:14:06] ok [13:14:15] in that case, will ping you when deployed [13:14:56] (03PS5) 10ArielGlenn: move hardcoded hostnames out of script for rsync of dumps to peers [puppet] - 10https://gerrit.wikimedia.org/r/381760 [13:15:08] (03Merged) 10jenkins-bot: Add 'eliminator' as a priviliged account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [13:16:25] (03CR) 10jenkins-bot: Add 'eliminator' as a priviliged account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [13:17:16] !log zfilipin@tin Synchronized wmf-config/Wikibase-labs.php: wmf-config/Wikibase.php wmf-config/Wikibase-production.php SWAT: [[gerrit:381529|labs: Use redis lock manager for dispatching changes of Wikibase (T175109)]] (duration: 00m 46s) [13:17:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:21] T175109: beta: Wikidata dispatchChanges.php causes a lot of "Wikimedia\Rdbms\DatabaseMysqlBase::unlock failed to release lock" - https://phabricator.wikimedia.org/T175109 [13:17:54] 10Operations, 10Mail, 10Wikidata: Large number of "A page you created was linked on Wikidata" emails to one recipient in short period of time - https://phabricator.wikimedia.org/T177099#3650596 (10herron) The deferred message count for this recipient had increased to 4145. I've removed these from the queue... [13:18:00] Amir1: 379985 is at mwdebug1002, please let me know if I can do a full deploy [13:18:29] zeljkof: works just fine [13:18:34] (testing it was easy) [13:19:08] Amir1: deploying [13:19:21] Thanks [13:20:00] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:379985|Add eliminator as a priviliged account (T176554)]] (duration: 00m 46s) [13:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:05] T176554: Enable 2FA for eliminators - https://phabricator.wikimedia.org/T176554 [13:20:20] Amir1: all deployed, thanks for releasing with #releng ;) [13:20:34] Thank you for helping [13:20:53] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:21:02] (03PS3) 10Zfilipin: Wikidata: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:21:09] (03CR) 10Zfilipin: Wikidata: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:21:15] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:21:46] Amir1: I'm just doing the needful ;) https://phabricator.wikimedia.org/phame/blog/view/1/ [13:22:26] 10Operations: install2002 free disk space warning - https://phabricator.wikimedia.org/T177214#3650601 (10herron) [13:24:30] (03Merged) 10jenkins-bot: Wikidata: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:25:23] Jayprakash12345, Framawiki, Zoranzoki21: around for SWAT? [13:25:34] (03PS4) 10Zfilipin: Enable the Extension:SandboxLink on the nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381634 (https://phabricator.wikimedia.org/T177170) (owner: 10Jayprakash12345) [13:26:14] (03CR) 10jenkins-bot: Wikidata: Don't persist description usages (yet) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381615 (https://phabricator.wikimedia.org/T177153) (owner: 10Hoo man) [13:26:41] (03PS2) 10Ottomata: Removing schema that no longer exists [puppet] - 10https://gerrit.wikimedia.org/r/381493 (https://phabricator.wikimedia.org/T171629) (owner: 10Nuria) [13:26:51] (03CR) 10Ottomata: [V: 032 C: 032] Removing schema that no longer exists [puppet] - 10https://gerrit.wikimedia.org/r/381493 (https://phabricator.wikimedia.org/T171629) (owner: 10Nuria) [13:27:05] !log zfilipin@tin Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:381615|Wikidata: Dont persist description usages (yet) (T177153)]] (duration: 00m 46s) [13:27:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:11] T177153: Add 'D' to $wgWBClientSettings['disabledUsageAspects'] - https://phabricator.wikimedia.org/T177153 [13:27:34] hoo: nevermind, looks like he is gone [13:28:14] hashar: around? the last four patches for swat are from 3 people, none of them around :| [13:28:19] should I just deploy? [13:28:26] or not deploy [13:28:31] the question is now... [13:28:45] (03CR) 10Rush: "I can see this working. It has a one intrinsic side effect of mixing assumed system users in data.yaml. It would be best to bring up in " (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465) (owner: 10Ottomata) [13:29:43] hoo: your patch is deployed, please check, if there is anything to check [13:29:57] hm, not showing up [13:30:40] 10Operations: install2002 free disk space warning - https://phabricator.wikimedia.org/T177214#3650633 (10herron) /srv is consuming the majority of the disk (65G). Are there files that can be deleted here? If not should we add a ~100G volume or so and create a separate /srv filesystem? [13:30:46] seems it's not synced [13:31:05] hoo: argh, my mistake, just a min [13:32:27] !log zfilipin@tin Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:381615|Wikidata: Dont persist description usages (yet) (T177153)]] (duration: 00m 45s) [13:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:32] T177153: Add 'D' to $wgWBClientSettings['disabledUsageAspects'] - https://phabricator.wikimedia.org/T177153 [13:33:29] looks good, awesome :) [13:33:31] thanks [13:33:41] (03PS1) 10Giuseppe Lavagetto: Rakefile: add wmf styleguide checks [puppet] - 10https://gerrit.wikimedia.org/r/381764 [13:33:45] hoo: should be deployed now [13:33:47] not my day... [13:33:58] hoo: thanks for releasing with #releng ;) [13:34:21] (03CR) 10Andrew Bogott: [C: 031] openstack: pdns fixup SOA default answer [puppet] - 10https://gerrit.wikimedia.org/r/381445 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [13:34:43] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381634 (https://phabricator.wikimedia.org/T177170) (owner: 10Jayprakash12345) [13:35:22] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Encrypt syslog traffic - https://phabricator.wikimedia.org/T136312#3650656 (10fgiunchedi) [13:36:29] (03Merged) 10jenkins-bot: Enable the Extension:SandboxLink on the nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381634 (https://phabricator.wikimedia.org/T177170) (owner: 10Jayprakash12345) [13:36:38] (03CR) 10jenkins-bot: Enable the Extension:SandboxLink on the nlwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381634 (https://phabricator.wikimedia.org/T177170) (owner: 10Jayprakash12345) [13:37:10] RECOVERY - exim queue on mx1001 is OK: OK: Less than 1000 mails in exim queue. [13:37:18] (03PS4) 10Rush: openstack: pdns fixup SOA default answer [puppet] - 10https://gerrit.wikimedia.org/r/381445 (https://phabricator.wikimedia.org/T171494) [13:37:47] (03CR) 10Filippo Giunchedi: "Curious, afaik we have the double a/aaaa ferm resolve() elsewhere too in beta (e.g. prometheus::apache_exporter) but it works afaik?" [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T176314) (owner: 10Hashar) [13:37:54] godog: can you close up your 'screen' session on labcontrol1001 if it's not doing anything? [13:38:09] (03PS3) 10Filippo Giunchedi: Thumbor: apply expensive type throttle to STL [puppet] - 10https://gerrit.wikimedia.org/r/381045 (https://phabricator.wikimedia.org/T166699) (owner: 10Gilles) [13:38:29] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:381634|Enable the Extension:SandboxLink on the nlwikinews (T177170)]] (duration: 00m 46s) [13:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:34] T177170: Add sandbox link to Dutch Wikinews - https://phabricator.wikimedia.org/T177170 [13:38:45] andrewbogott: yup, {{done}} [13:38:54] (03CR) 10Filippo Giunchedi: [C: 032] Thumbor: apply expensive type throttle to STL [puppet] - 10https://gerrit.wikimedia.org/r/381045 (https://phabricator.wikimedia.org/T166699) (owner: 10Gilles) [13:38:56] (03PS2) 10Zfilipin: Change the zh-classicalwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381630 (https://phabricator.wikimedia.org/T177165) (owner: 10Jayprakash12345) [13:39:01] godog: thanks! [13:39:37] (03CR) 10Ottomata: Prometheus based Kafka broker alerts, take 1 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) (owner: 10Ottomata) [13:39:45] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381630 (https://phabricator.wikimedia.org/T177165) (owner: 10Jayprakash12345) [13:40:04] (03PS10) 10Ottomata: Prometheus based Kafka broker alerts, take 1 [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) [13:42:27] (03Merged) 10jenkins-bot: Change the zh-classicalwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381630 (https://phabricator.wikimedia.org/T177165) (owner: 10Jayprakash12345) [13:42:32] !log roll restart thumbor to apply https://gerrit.wikimedia.org/r/#/c/381045/ - T166699 [13:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:37] T166699: Evaluate performance of 3d2png on beta cluster - https://phabricator.wikimedia.org/T166699 [13:42:39] (03PS3) 10Zfilipin: Enable Extension:Newsletter on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) (owner: 10Framawiki) [13:42:41] (03CR) 10jenkins-bot: Change the zh-classicalwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381630 (https://phabricator.wikimedia.org/T177165) (owner: 10Jayprakash12345) [13:43:14] (03CR) 10Herron: [C: 032] Change check_ipmi_temp to check_ipmi_sensor and monitor Power_Supply [puppet] - 10https://gerrit.wikimedia.org/r/376048 (https://phabricator.wikimedia.org/T109903) (owner: 10Herron) [13:44:33] !log zfilipin@tin Synchronized static/images/project-logos/zh_classicalwiki-1.5x.png: static/images/project-logos/zh_classicalwiki-2x.png static/images/project-logos/zh_classicalwiki.png wmf-config/InitialiseSettings.php SWAT: [[gerrit:381630|Change the zh-classicalwiki logo (T177165)]] (duration: 00m 46s) [13:44:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:38] T177165: Change the zh-classical Wikipedia logo - https://phabricator.wikimedia.org/T177165 [13:44:42] (03PS11) 10Ottomata: Prometheus based Kafka broker alerts, take 1 [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) [13:45:16] (03PS1) 10DCausse: Adjust wgNamespacesToBeSearchedDefault for enwikibooks, fawikibooks and hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381766 (https://phabricator.wikimedia.org/T176906) [13:46:03] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Fix rubocop and add 'rake test' for CI [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381744 (owner: 10Hashar) [13:46:26] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add .gitreview file [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381743 (owner: 10Hashar) [13:46:42] (03PS3) 10Herron: Change check_ipmi_temp to check_ipmi_sensor and monitor Power_Supply [puppet] - 10https://gerrit.wikimedia.org/r/376048 (https://phabricator.wikimedia.org/T109903) [13:47:17] (03PS2) 10DCausse: Adjust wgNamespacesToBeSearchedDefault for enwikibooks, fawikibooks and hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381766 (https://phabricator.wikimedia.org/T176906) [13:50:30] !log zfilipin@tin Synchronized static/images/project-logos/: SWAT: [[gerrit:381630|Change the zh-classicalwiki logo (T177165)]] (duration: 00m 46s) [13:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:36] T177165: Change the zh-classical Wikipedia logo - https://phabricator.wikimedia.org/T177165 [13:51:01] (03CR) 10Addshore: [C: 031] Stop using $wgWikibaseSharedCacheKeyPrefix from Wikidata build [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381371 (https://phabricator.wikimedia.org/T176948) (owner: 10Aude) [13:51:15] (03PS1) 10ArielGlenn: move hardcoded phab hostname out of rsync config for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/381767 (https://phabricator.wikimedia.org/T175528) [13:51:43] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:381630|Change the zh-classicalwiki logo (T177165)]] (duration: 00m 45s) [13:51:44] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded phab hostname out of rsync config for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/381767 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [13:51:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:53:39] (03PS2) 10ArielGlenn: move hardcoded phab hostname out of rsync config for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/381767 (https://phabricator.wikimedia.org/T175528) [13:54:31] Hello zfilipin, Are they problem with [[gerrit:381630|Change the zh-classicalwiki logo (T177165)]] [13:55:27] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) (owner: 10Framawiki) [13:55:51] Jayprakash12345: all should be fine, do you see something strange? [13:57:09] (03Merged) 10jenkins-bot: Enable Extension:Newsletter on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) (owner: 10Framawiki) [13:57:18] No, I am seen first time that Stashbot Synchronized 3times. [13:57:20] (03CR) 10jenkins-bot: Enable Extension:Newsletter on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379316 (https://phabricator.wikimedia.org/T176199) (owner: 10Framawiki) [13:57:22] (03PS1) 10Filippo Giunchedi: hieradata: enable syslog over tls for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/381768 (https://phabricator.wikimedia.org/T136312) [13:57:47] (03PS3) 10ArielGlenn: move hardcoded phab hostname out of rsync config for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/381767 (https://phabricator.wikimedia.org/T175528) [13:58:27] (03CR) 10ArielGlenn: [C: 032] move hardcoded phab hostname out of rsync config for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/381767 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [13:59:19] Jayprakash12345: I was not able to check if things are fine, I was not sure if I made a mistake somewhere, everything worked fine once I have purged the caches :) [13:59:37] 10Operations, 10Goal, 10Technical-Debt: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195#3650796 (10Aklapper) [13:59:38] (03PS3) 10Zfilipin: Enable Sandbox menu to Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) (owner: 10Zoranzoki21) [13:59:59] !log extending EU SWAT for 5-10 minutes [14:00:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:20] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:379316|Enable Extension:Newsletter on officewiki (T176199)]] (duration: 00m 46s) [14:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:27] T176199: Enable Newsletter extension in office.wikimedia.org (and Structured Discussions in its related Talk namespace) - https://phabricator.wikimedia.org/T176199 [14:01:30] (03CR) 10Elukey: [C: 031] "lgtm https://puppet-compiler.wmflabs.org/compiler02/8137/kafka-jumbo1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) (owner: 10Ottomata) [14:02:40] gr8! [14:03:04] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3650803 (10faidon) Given that we're not under load/pressure and increasing the frequency has the potential of hiding issues during troubleshooting, I'd be inclined to leave it unchanged, at 10 minutes.... [14:03:12] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) (owner: 10Zoranzoki21) [14:03:22] (03CR) 10Ottomata: [C: 032] Prometheus based Kafka broker alerts, take 1 [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) (owner: 10Ottomata) [14:03:26] (03PS12) 10Ottomata: Prometheus based Kafka broker alerts, take 1 [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) [14:03:38] (03CR) 10Ottomata: [V: 032 C: 032] Prometheus based Kafka broker alerts, take 1 [puppet] - 10https://gerrit.wikimedia.org/r/381489 (https://phabricator.wikimedia.org/T175923) (owner: 10Ottomata) [14:04:31] 10Operations, 10monitoring, 10Patch-For-Review: add pdu redundancy checking to server/router/switch checks in icinga - https://phabricator.wikimedia.org/T109903#3650809 (10herron) Ipmi power_supply monitoring is looking good so far. Alerts should start trickeling in over the next hour or two. ``` cp4007 C... [14:06:39] (03Merged) 10jenkins-bot: Enable Sandbox menu to Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) (owner: 10Zoranzoki21) [14:06:49] (03CR) 10jenkins-bot: Enable Sandbox menu to Javanese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381718 (https://phabricator.wikimedia.org/T176308) (owner: 10Zoranzoki21) [14:07:29] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3650816 (10faidon) I'm not sure if a different implementation (like fping) is going to make a difference. check_ping is "slow" because it sends multiple packets over 1-second intervals -- that will always consume... [14:07:55] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:381718|Enable Sandbox menu to Javanese Wikipedia (T176308)]] (duration: 00m 46s) [14:08:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:01] T176308: Adding Sandbox menu to Javanese Wikipedia - https://phabricator.wikimedia.org/T176308 [14:08:12] !log EU SWAT finished [14:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:15] 10Operations, 10monitoring: Review check_puppetrun frequency - https://phabricator.wikimedia.org/T173427#3650822 (10faidon) I think the proposal is to bump check interval from 1 minute to 5 minutes, right? Any other actionables here? [14:12:29] (03PS1) 10ArielGlenn: move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) [14:14:00] 10Operations, 10Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 2 others: logs sent to logstash are lost when the elasticsearch cirrus cluster is unavailable - https://phabricator.wikimedia.org/T176335#3650849 (10debt) [14:14:03] 10Operations, 10Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 3 others: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430#3650848 (10debt) 05Open>03Resolved [14:14:28] 10Operations: install2002 free disk space warning - https://phabricator.wikimedia.org/T177214#3650601 (10faidon) install1002 and install2002 should be identical, so why one is alerting and the other one isn't? I think there's an rsync from one to the other to keep them in sync, perhaps we aren't passing `--delet... [14:18:54] (03CR) 10Filippo Giunchedi: "PCC says yes https://puppet-compiler.wmflabs.org/compiler02/8142/" [puppet] - 10https://gerrit.wikimedia.org/r/381768 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [14:20:21] (03PS2) 10Filippo Giunchedi: hieradata: enable syslog over tls for ulsfo/esams [puppet] - 10https://gerrit.wikimedia.org/r/381768 (https://phabricator.wikimedia.org/T136312) [14:22:42] (03PS2) 10ArielGlenn: move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) [14:23:17] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [14:24:12] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1006 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:25:29] ! :) [14:25:30] looking [14:26:12] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1004 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:26:34] volunteers for stamping +1 on https://gerrit.wikimedia.org/r/#/c/381768 ? [14:26:46] (03PS3) 10ArielGlenn: move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) [14:27:08] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3650893 (10ovasileva) [14:28:02] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1006 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [14:29:28] godog: any idea why check_prometheus_metric doesn't exist on kafka-jumbo hosts...reading puppet, it looks like it should be there [14:29:53] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1002 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:29:53] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1004 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [14:30:23] ottomata: no idea off the bat, though it should be on icinga hosts not on kafka-jumbo ? iirc it doesn't use nrpe [14:30:34] oh [14:30:36] ok [14:30:40] right right [14:32:29] (03PS4) 10ArielGlenn: move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) [14:33:43] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1002 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [14:35:42] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1003 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:35:48] hmm [14:36:00] elukey: from einsteinium [14:36:02] curl http://prometheus.svc.eqiad.wmnet/ops/metrics | grep kafka [14:36:09] only shows a few metrics about scrape jobs [14:36:14] maybe that's the wrong url? [14:36:39] godog: ^ ? [14:36:58] (03PS4) 10Jayprakash12345: Enable ShortUrl Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 (https://phabricator.wikimedia.org/T177187) [14:37:19] (03CR) 10Jayprakash12345: "check" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 (https://phabricator.wikimedia.org/T177187) (owner: 10Jayprakash12345) [14:37:30] ottomata: checking [14:37:33] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1001 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:37:33] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1005 is CRITICAL: The command defined for service Kafka Broker Replica Max Lag does not exist [14:37:42] I am also going to downtime kafka-jumbo [14:37:56] (03PS4) 10Jayprakash12345: Enable NewUserMessage Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381715 (https://phabricator.wikimedia.org/T177188) [14:38:25] k [14:38:53] ottomata: that's the url for metrics from prometheus server itself, though the url you got in the alert should be correct [14:39:03] the check will use the query endpoint to run the query [14:40:03] PROBLEM - IPMI Sensor Status on cp3034 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:40:10] (03PS2) 10Jayprakash12345: Change the Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381650 (https://phabricator.wikimedia.org/T176008) [14:40:12] PROBLEM - IPMI Sensor Status on lvs4002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:40:16] ah i see godog [14:40:17] ok [14:42:02] PROBLEM - IPMI Sensor Status on analytics1002 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:42:02] PROBLEM - IPMI Sensor Status on cp3032 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:42:02] PROBLEM - IPMI Sensor Status on cp3039 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:42:03] PROBLEM - IPMI Sensor Status on kafka1022 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:42:03] PROBLEM - IPMI Sensor Status on kafka1018 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:45:52] PROBLEM - IPMI Sensor Status on cp3031 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:45:52] PROBLEM - IPMI Sensor Status on cp3030 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:45:58] <_joe_> wat [14:46:03] these are new checks [14:46:09] ^ new ipmi power supply checks [14:46:10] we didn't have PS redundancy checks before :) [14:46:12] PROBLEM - IPMI Sensor Status on wtp1018 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:46:12] PROBLEM - IPMI Sensor Status on wtp1024 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:46:15] and apparently lots of broken ones :) [14:46:18] <_joe_> oh I see [14:46:21] <_joe_> sigh, yeah [14:46:22] <_joe_> :P [14:47:52] PROBLEM - IPMI Sensor Status on elastic1022 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:49:56] 10Operations, 10Fundraising-Backlog, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3650967 (10Jgreen) [14:49:59] 10Operations, 10fundraising-tech-ops, 10netops: remove fundraising firewall rules related to ganglia - https://phabricator.wikimedia.org/T176319#3650965 (10Jgreen) 05Open>03Resolved this is done [14:51:32] PROBLEM - IPMI Sensor Status on analytics1035 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] [14:51:33] PROBLEM - IPMI Sensor Status on conf1003 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical, PS Redundancy = Critical] [14:51:42] PROBLEM - IPMI Sensor Status on kafka1020 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Status = Critical] [14:52:46] (03PS1) 10Ottomata: check_prometheus_metric - output error from Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/381778 [14:53:14] (03PS2) 10Ottomata: check_prometheus_metric - output error from Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/381778 [14:53:23] godog: ^ [14:53:32] PROBLEM - IPMI Sensor Status on cp3037 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:53:34] PROBLEM - IPMI Sensor Status on cp3036 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:53:34] PROBLEM - IPMI Sensor Status on db1052 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:53:42] PROBLEM - IPMI Sensor Status on mw1200 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:54:06] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3650983 (10herron) We could leave it alone. My main concern was around reducing check concurrency after seeing the parent icinga process sitting at 100% cpu each time I looked, but I don't have a good trend to h... [14:55:23] PROBLEM - IPMI Sensor Status on cp3035 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [14:56:11] well, apparently it's a valuable check to have in place :) [14:56:19] (03PS1) 10Ottomata: Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 [14:56:46] <_joe_> bblack: unless some of those are false positives, which given it goes through ipmi, seems very possible :) [14:56:47] (03CR) 10ArielGlenn: [C: 032] move hardcoded refs to labstore hosts from dump manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381770 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [14:56:49] (03CR) 10jerkins-bot: [V: 04-1] Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 (owner: 10Ottomata) [14:57:33] yeah but power supply failures aren't that uncommon, either. if we've gone years without looking for them, other than occasionally looking at the BMC's SEL when some other issue happens or when both PSUs fail... [14:57:43] (03PS2) 10Ottomata: Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 (https://phabricator.wikimedia.org/T381489) [14:57:44] there could be quite a few legit single-psu-failure cases waiting out there [14:58:20] (03CR) 10Ottomata: [C: 032] Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 (https://phabricator.wikimedia.org/T381489) (owner: 10Ottomata) [14:58:26] (03PS3) 10Ottomata: Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 (https://phabricator.wikimedia.org/T381489) [14:58:28] (03CR) 10Ottomata: [V: 032 C: 032] Quote Prometheus query label values in kafka broker monitoring [puppet] - 10https://gerrit.wikimedia.org/r/381779 (https://phabricator.wikimedia.org/T381489) (owner: 10Ottomata) [14:59:22] (03CR) 10Filippo Giunchedi: "LGTM, I mentioned the unrelated changes because if the change is useful in general we should be sending it upstream at https://github.com/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/381778 (owner: 10Ottomata) [15:03:34] (03PS3) 10Ottomata: check_prometheus_metric - output error from Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/381778 [15:03:52] (03CR) 10Ottomata: check_prometheus_metric - output error from Prometheus query (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/381778 (owner: 10Ottomata) [15:04:37] (03CR) 10Mforns: [C: 04-1] Implement Schema:Print purging strategy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [15:05:02] (03PS4) 10Ottomata: check_prometheus_metric - output error from Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/381778 [15:07:07] 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10HHVM: Convert Wikimedia production HHVM instances to have hhvm.php7.all set true - https://phabricator.wikimedia.org/T173786#3539734 (10Anomie) The HHVM documentation for the setting states that it controls the following: # Disallow and warn w... [15:12:50] PROBLEM - IPMI Sensor Status on db1054 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:14:59] PROBLEM - IPMI Sensor Status on mw2160 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:16:01] (03PS1) 10ArielGlenn: move hardcoded mwlog hostnames from dump rsync manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381784 (https://phabricator.wikimedia.org/T175528) [15:16:24] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded mwlog hostnames from dump rsync manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381784 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [15:16:49] PROBLEM - IPMI Sensor Status on mw2176 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:16:59] PROBLEM - IPMI Sensor Status on wtp1021 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:18:29] PROBLEM - IPMI Sensor Status on analytics1036 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] [15:20:14] (03PS2) 10ArielGlenn: move hardcoded mwlog hostnames from dump rsync manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381784 (https://phabricator.wikimedia.org/T175528) [15:20:29] PROBLEM - IPMI Sensor Status on cp3038 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:20:29] PROBLEM - IPMI Sensor Status on db1080 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [Power Supply 2 = Critical, Power Supplies = Critical] [15:24:19] PROBLEM - IPMI Sensor Status on analytics1037 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical, Status = Critical] [15:24:29] PROBLEM - IPMI Sensor Status on mw1203 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:25:06] it's interesting that all of cp303x are showing the psu issue and none of cp304x. likely there's a whole power rail missing there on some rack...? [15:25:47] (03CR) 10EBernhardson: [C: 031] Adjust wgNamespacesToBeSearchedDefault for enwikibooks, fawikibooks and hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381766 (https://phabricator.wikimedia.org/T176906) (owner: 10DCausse) [15:26:19] they're all together in the first rack heh, just them and a switch [15:26:54] (03CR) 10ArielGlenn: [C: 032] move hardcoded mwlog hostnames from dump rsync manifests to profile [puppet] - 10https://gerrit.wikimedia.org/r/381784 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [15:26:55] oh that is interesting [15:27:03] makes sense [15:27:06] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3651053 (10Eevans) Prometheus dashboards (still a work-in-progress), are [[ https://grafana.wikimedia.org/dashboard/db/cassandra?orgId=1 | here ]]... [15:27:23] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3651055 (10Eevans) [15:27:34] 10Operations, 10Cassandra, 10Epic, 10Goal, and 2 others: End of August milestone: Cassandra 3 cluster in production - https://phabricator.wikimedia.org/T169939#3532203 (10Eevans) 05Open>03Resolved [15:28:09] PROBLEM - IPMI Sensor Status on cp3033 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [15:28:22] herron: ^^^ [15:28:25] FYI :) [15:28:40] urandom: any opinions here? https://phabricator.wikimedia.org/T177078 [15:32:08] ottomata: yes. [15:34:31] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3651100 (10akosiaris) a:03akosiaris [15:35:05] there also appears to be some commonality between PSU alerts and eqiad D8 [15:35:05] D8: Add basic .arclint that will handle pep8 and pylint checks - https://phabricator.wikimedia.org/D8 [15:35:20] 10Operations, 10Contributors-Team, 10MobileFrontend, 10wikidiff2, and 2 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3632060 (10jkroll) This seems to be related to the new wikidiff2. I wrote the moved-paragraphs patch, and I have bee... [15:35:31] 10Operations, 10monitoring: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3523904 (10akosiaris) Let's just increase the `retry_interval` from 5 to 10. There is no chance a disk is going to go back from failing to OK and the retries should be enough anyway to fix transient net... [15:35:55] I was initially thinking to create tasks and acke ach alert but maybe it makes more sense to create tasks to check out the PDUs first [15:36:48] (03PS5) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [15:36:58] 10Operations, 10monitoring: Review check_puppetrun frequency - https://phabricator.wikimedia.org/T173427#3651110 (10akosiaris) a:03akosiaris Yes, let's just bump the check interval a bit. 5 mins sounds ok and given the on-demand nature of schedules on failing services on hosts we can be pretty sure we won't... [15:37:01] (03PS14) 10Zoranzoki21: Enable RemexHTML on wikitech and eswikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [15:38:24] 10Operations, 10monitoring: Fix permissions for systemd file - https://phabricator.wikimedia.org/T155869#3651129 (10akosiaris) a:03akosiaris [15:38:46] 10Operations, 10monitoring: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3651131 (10fgiunchedi) [15:39:17] 10Operations, 10Icinga, 10monitoring: re-create script for manual paging - https://phabricator.wikimedia.org/T82937#907272 (10faidon) a:03Dzahn [15:41:04] 10Operations, 10DC-Ops: Multiple servers in equad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3651165 (10herron) [15:42:27] (03CR) 10Filippo Giunchedi: [C: 031] check_prometheus_metric - output error from Prometheus query [puppet] - 10https://gerrit.wikimedia.org/r/381778 (owner: 10Ottomata) [15:42:42] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3651181 (10faidon) p:05Triage>03High a:03Cmjohnson [15:43:23] 10Operations, 10DC-Ops: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3651185 (10herron) [15:44:16] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3651201 (10faidon) 05Open>03declined No, no per-process statistics that I know of :( Declining this, for now at least, per feedback I've heard from you and others :) [15:45:07] ottomata: i dread the day that you take up *my* name [15:45:23] 10Operations, 10monitoring: Review check_ping settings - https://phabricator.wikimedia.org/T173315#3651210 (10herron) Fair enough :) [15:45:42] 10Operations, 10ops-eqiad, 10DC-Ops: Multiple servers in eqiad D8 showing PSU failures - https://phabricator.wikimedia.org/T177227#3651211 (10Marostegui) //cc @jcrespo db1102 (sanitarium host) and es1019 (a slave) are there [15:45:46] herron: there are ops-$site phabricator tags as well, usually easier to add them if you know the exact site that an issue is for [15:46:03] ahh, good to know! [15:46:11] :) [15:46:20] PROBLEM - HHVM rendering on mw2143 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:46:41] 10Operations, 10ops-esams, 10DC-Ops: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3651214 (10herron) [15:47:09] it should be the hhvm-restart script running -^ [15:47:10] RECOVERY - HHVM rendering on mw2143 is OK: HTTP OK: HTTP/1.1 200 OK - 79102 bytes in 0.292 second response time [15:47:37] 10Operations, 10monitoring, 10Patch-For-Review: Add PDU redundancy server/router/switch checks in Icinga - https://phabricator.wikimedia.org/T109903#3651221 (10faidon) a:03herron [15:47:52] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Encrypt syslog traffic - https://phabricator.wikimedia.org/T136312#3651223 (10faidon) a:03fgiunchedi [15:52:13] 10Operations, 10Traffic, 10monitoring, 10Patch-For-Review: prometheus -> grafana stats for per-numa-node meminfo - https://phabricator.wikimedia.org/T175636#3651253 (10BBlack) p:05Normal>03Low [15:54:13] (03PS1) 10ArielGlenn: move hardcoded path to public dumps dir out of most dumps rsync manifests [puppet] - 10https://gerrit.wikimedia.org/r/381793 (https://phabricator.wikimedia.org/T175528) [15:54:40] (03CR) 10jerkins-bot: [V: 04-1] move hardcoded path to public dumps dir out of most dumps rsync manifests [puppet] - 10https://gerrit.wikimedia.org/r/381793 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [15:56:08] (03PS3) 10Filippo Giunchedi: hieradata: enable syslog over tls for ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/381768 (https://phabricator.wikimedia.org/T136312) [15:56:40] volans: ^ [15:57:20] godog: thanks! [15:59:04] godog: it seems a bit fragile the puppetization... if $remote_syslog { and this will be an empty array, that in Ruby is truthy... but... :) [16:01:26] heh, you are right it works by chance now, I'll send a patch tomorrow [16:07:10] urandom: haha [16:09:14] thanks! [16:10:55] (03PS1) 10DCausse: Use the official RC1 release of the ltr plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/381798 [16:17:21] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#2346094 (10kaythaney) hi all, update from friday's call with marc, daniel, brandon and i. i'm currently working to schedule time with our wordpress VIP reps to discuss their ability to handle email f... [16:21:28] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3651391 (10mmodell) [16:23:21] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3651406 (10BBlack) This task hasn't been updated for various IRC/Hangouts discussions since. We did decide to move forward with V5 upgrades. Arzhel has built preliminary packages, and we ha... [16:23:51] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade cache_misc to Varnish 5 - https://phabricator.wikimedia.org/T177233#3651409 (10BBlack) [16:32:25] 10Operations, 10Goal: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208#3651440 (10jcrespo) [16:32:27] 10Operations, 10Goal: Improve database backups' coverage, monitoring and data recovery time (part 1) (tracking) - https://phabricator.wikimedia.org/T169658#3651441 (10jcrespo) [16:38:30] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1001 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [16:38:50] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1003 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [16:39:02] dunno why its doing that yet. the check_prometheus_metric command works from einsteinium [16:39:39] PROBLEM - Kafka Broker Under Replicated Partitions on kafka-jumbo1005 is CRITICAL: The command defined for service Kafka Broker Under Replicated Partitions does not exist [16:40:48] 10Operations, 10Beta-Cluster-Infrastructure, 10media-storage, 10Patch-For-Review: nscd does not cache localhost causing high CPU usage when localhost is often resolved - https://phabricator.wikimedia.org/T171745#3651496 (10greg) [16:42:34] 10Operations, 10ORES, 10Graphite, 10Scoring-platform-team (Current), 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3651498 (10awight) Copying an inventory of metrics here might help. [16:51:04] 10Operations: install2002 free disk space warning - https://phabricator.wikimedia.org/T177214#3651530 (10herron) >>! In T177214#3650850, @faidon wrote: > install1002 and install2002 should be identical, so why one is alerting and the other one isn't? I think there's an rsync from one to the other to keep them in... [16:51:41] 10Operations, 10Collection, 10OfflineContentGenerator, 10Readers-Web-Backlog, 10Services (watching): Remove deprecated features from book creator UI - https://phabricator.wikimedia.org/T150917#3651537 (10CKoerner_WMF) [16:53:14] 10Operations, 10ops-eqiad, 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651543 (10Cmjohnson) @Marostegui Please let me know when you're available this week. [16:53:48] 10Operations, 10ops-eqiad, 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651547 (10Marostegui) >>! In T174054#3651543, @Cmjohnson wrote: > @Marostegui Please let me know when you're available this week. What about Thursday? [16:55:21] 10Operations, 10ops-eqiad, 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651551 (10Cmjohnson) Thursday works [16:56:33] 10Operations, 10ops-eqiad, 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3651555 (10Marostegui) >>! In T174054#3651551, @Cmjohnson wrote: > Thursday works Awesome, ping me when you get online! Thank you! [16:59:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1015 crashes - https://phabricator.wikimedia.org/T171473#3651563 (10Cmjohnson) The CPU failed again over the weekend, Record: 2 Date/Time: 10/01/2017 01:21:53 Source: system Severity: Critical Description: CPU 2 ma... [17:00:04] gehel: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:01:42] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3651566 (10Dzahn) @kaythaney Thank you for the update. It's appreciated!:) [17:02:50] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [17:09:05] (03PS1) 10Dzahn: screen-monitor: whitelist rhenium [puppet] - 10https://gerrit.wikimedia.org/r/381810 [17:09:20] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:11:09] (03PS2) 10Dzahn: screen-monitor: whitelist rhenium [puppet] - 10https://gerrit.wikimedia.org/r/381810 [17:13:40] (03PS2) 10ArielGlenn: move hardcoded path to public dumps dir out of most dumps rsync manifests [puppet] - 10https://gerrit.wikimedia.org/r/381793 (https://phabricator.wikimedia.org/T175528) [17:15:45] (03CR) 10Andrew Bogott: [C: 032] wmcs: Make /usr/local/bin/log-command-invocation a no-op [puppet] - 10https://gerrit.wikimedia.org/r/381712 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [17:15:52] (03PS2) 10Andrew Bogott: wmcs: Make /usr/local/bin/log-command-invocation a no-op [puppet] - 10https://gerrit.wikimedia.org/r/381712 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [17:20:09] 10Operations, 10Release-Engineering-Team, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3651680 (10thcipriani) [17:20:43] 10Operations, 10Release-Engineering-Team, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3432906 (10thcipriani) [17:23:37] PROBLEM - Host db2044.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:23:40] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3651691 (10phuedx) [17:27:48] (03PS3) 10Dzahn: screen-monitor: whitelist rhenium [puppet] - 10https://gerrit.wikimedia.org/r/381810 [17:28:32] (03CR) 10Dzahn: [C: 032] screen-monitor: whitelist rhenium [puppet] - 10https://gerrit.wikimedia.org/r/381810 (owner: 10Dzahn) [17:36:06] (03PS1) 10Herron: Aptrepo: Add --delete flag to rsync-aptrepo cron job [puppet] - 10https://gerrit.wikimedia.org/r/381813 (https://phabricator.wikimedia.org/T177214) [17:37:05] (03PS5) 10Rush: openstack: pdns fixup SOA default answer [puppet] - 10https://gerrit.wikimedia.org/r/381445 (https://phabricator.wikimedia.org/T171494) [17:37:48] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [17:39:22] (03PS3) 10ArielGlenn: move hardcoded path to public dumps dir out of most dumps rsync manifests [puppet] - 10https://gerrit.wikimedia.org/r/381793 (https://phabricator.wikimedia.org/T175528) [17:40:03] (03CR) 10ArielGlenn: [C: 032] move hardcoded path to public dumps dir out of most dumps rsync manifests [puppet] - 10https://gerrit.wikimedia.org/r/381793 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [17:41:36] (03CR) 10Rush: [C: 032] openstack: pdns fixup SOA default answer [puppet] - 10https://gerrit.wikimedia.org/r/381445 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [17:41:47] (03PS6) 10Rush: openstack: pdns fixup SOA default answer [puppet] - 10https://gerrit.wikimedia.org/r/381445 (https://phabricator.wikimedia.org/T171494) [17:43:52] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3651785 (10VColeman) a:03Dzahn Approved on my side [17:44:12] !log gehel@tin Started deploy [wdqs/wdqs@f20b726]: latest wdqs GUI + minor fixes [17:44:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:55] 10Operations, 10Collection, 10OfflineContentGenerator, 10Readers-Web-Backlog (Tracking), 10Services (watching): Replace OCG in collection extension with Electron - https://phabricator.wikimedia.org/T150872#2799695 (10CKoerner_WMF) [17:46:00] !log gehel@tin Finished deploy [wdqs/wdqs@f20b726]: latest wdqs GUI + minor fixes (duration: 01m 48s) [17:46:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:59] 10Operations, 10Reading-Admin, 10Traffic: TEST: redirect small portion of unauthenticated desktop users to mobile web - https://phabricator.wikimedia.org/T117826#3651798 (10CKoerner_WMF) [17:47:01] SMalyshev: wdqs deployment completed, tests are green [17:47:13] gehel: great, thank you! [17:48:44] dinner time, see you later! [17:48:46] I scheduled [config] 381650 Change the Turkish Wiktionary logo (T176008) for Morning SWAT. [17:48:47] T176008: Turkish Wiktionary logo - https://phabricator.wikimedia.org/T176008 [17:51:55] (03PS2) 10Arlolra: Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 [17:55:25] (03CR) 10Subramanya Sastry: [C: 031] Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 (owner: 10Arlolra) [17:59:56] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3651828 (10Gilles) Is the plan to use it with hitch in front of it rather than nginx? Or just an upgrade for now and we'll see about that part later? [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 8 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T1800). [18:00:04] Krinkle, RoanKattouw, and Jayprakash12345: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:34] ok [18:00:52] I can SWAT. [18:01:32] Yay [18:01:34] I'm here [18:01:53] 10Operations, 10Contributors-Team, 10MobileFrontend, 10wikidiff2, and 2 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3651829 (10MaxSem) I can give you access - what's your wikitech username? [18:02:02] Niharika: Did you write the horoscope one? :D [18:02:16] Unfortunately so. :P [18:02:26] lol [18:02:28] good job [18:03:40] (03CR) 10Niharika29: [C: 032] Change the Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381650 (https://phabricator.wikimedia.org/T176008) (owner: 10Jayprakash12345) [18:07:48] (03Merged) 10jenkins-bot: Change the Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381650 (https://phabricator.wikimedia.org/T176008) (owner: 10Jayprakash12345) [18:07:59] (03CR) 10jenkins-bot: Change the Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381650 (https://phabricator.wikimedia.org/T176008) (owner: 10Jayprakash12345) [18:09:33] Jayprakash12345: Your patch is on mwdebug1002. You can test. [18:10:53] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623015 (10faidon) Taking into account the lack funding for appserver work, as well as the end of the year fundraising and Christmas freezes, the (tenta... [18:12:42] Nikharika: Means What [18:12:46] (03CR) 10Framawiki: [C: 031] Enable NewUserMessage Extension on hiwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381715 (https://phabricator.wikimedia.org/T177188) (owner: 10Jayprakash12345) [18:13:09] (03CR) 10Framawiki: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381714 (https://phabricator.wikimedia.org/T177187) (owner: 10Jayprakash12345) [18:14:08] Jayprakash12345: You need to test your patch. See https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [18:14:52] TL:DR - Download Chrome extension (https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb) and select mwdebug1002 in dropdown, turn it on and make sure you see your changes. [18:15:08] ;* [18:17:23] RoanKattouw: Your changes are on mwdebug1002 as well. [18:17:33] Krinkle: Are you around? [18:19:11] Niharika: Ugh, dammit my patch is broken in a way I didn't realize [18:19:19] Until I just tested it [18:19:21] So I need to back that one out [18:19:41] RoanKattouw: Okay, I'll revert. [18:19:46] Thanks [18:20:48] Jayprakash12345: Are you testing your change? [18:22:40] I am newbie. I enable extension. But I dont how to use it? [18:23:31] Jayprakash12345: Go to the page where your changes would appear if deployed. Turkish wiktionary, in this case. [18:23:57] From the extension, select mwdebug1002 in dropdown and turn it on (big button). [18:24:32] Refresh and see your changes. [18:24:50] Just give me a second. [18:25:34] Jayprakash12345: Sure. Ask me if anything is still confusing. [18:29:16] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [18:29:52] Yah I Turkish wiktionary. there is no deployment, Still I can see old Logo [18:30:18] Jayprakash12345: Did you turn on extension and select "mwdebug1002" in dropdown? [18:30:49] yes [18:32:01] I also seen mwdebug1001 and two more drop up menu [18:32:45] RECOVERY - Host db2044.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.69 ms [18:32:52] One minte please [18:34:43] Jayprakash12345: Is https://commons.wikimedia.org/wiki/File:TmY_e12_Vikisözlük_136x165px.png the new logo? [18:37:45] yah, But local admin Deploy it by mediawiki namespace. But in Prefernce old logo still show. I enable extenstion on my second broswer Firefox. In Firefox The Deploment is show [18:38:46] Jayprakash12345: So it works in Firefox but not Chrome? [18:39:57] yah, But local admin Deploy it by mediawiki namespace. But in Prefernce old logo still show. I enable extenstion on my second broswer Firefox. In Firefox The Deploment is show [18:40:24] I don't understand. [18:40:35] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [18:40:51] https://imgur.com/a/oTbuU [18:41:32] In firefox the Deployment is sow [18:41:33] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3651957 (10Dzahn) Thanks Victoria! @Framawiki It's going well, you have all the approvals now incl. c-level. Per the workflow steps, I am now contacti... [18:41:48] Jayprakash12345_: So it works for you. The patch looks sane so I'm going to deploy it. [18:43:11] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3651959 (10Dzahn) [18:44:35] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [18:45:56] Everything is fine. You can run Stashbot. [18:46:54] !log niharika29@tin Synchronized static/images/project-logos/: Change the Turkish Wiktionary logo (T176008) (duration: 00m 48s) [18:46:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:00] T176008: Turkish Wiktionary logo - https://phabricator.wikimedia.org/T176008 [18:48:03] !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Change the Turkish Wiktionary logo (T176008) (duration: 00m 46s) [18:48:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:17] Jayprakash12345: You can check now. It's deployed. [18:51:19] After the off the extenstion. the Old Logo Still Show [18:52:17] https://imgur.com/a/IZNiE [18:52:52] https://tr.wiktionary.org/wiki/%C3%96zel:Tercihler [18:54:30] Jayprakash12345: It might be caching. You can wait a while for it to show up. [18:56:09] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3651975 (10BBlack) Definitely not looking at Hitch presently. Just swapping out Varnish4 for Varnish5 in the existing software stack for both the frontend and backend cache processes. Forwa... [18:56:49] Can you check? In your local computer [18:57:56] Jayprakash12345: Caching affects everywhere. I can't see the new logo either. [18:59:30] In earlyer SWAT zfilipin run Stashbot 3 times (https://phabricator.wikimedia.org/T177165) [19:01:06] Jayprakash12345: His first and second deploys look the same. [19:01:19] They both do the same thing. [19:03:18] I see changes on tr.wiki with enable extenstion. But on Disable, there are still show old logo. [19:04:27] 10Operations, 10ops-codfw, 10DBA: db2044 HW RAID failure - https://phabricator.wikimedia.org/T174764#3652001 (10Papaul) a:05Papaul>03Marostegui Main board replacement complete. [19:04:42] (03PS2) 10Jforrester: Update WMF's address [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372824 (https://phabricator.wikimedia.org/T173684) (owner: 10Urbanecm) [19:04:47] I see changes on tr.wiki with enable extenstion. But on Disable, there are still show old logo. [19:04:49] (03CR) 10Jforrester: [C: 031] "Let's get this deployed today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372824 (https://phabricator.wikimedia.org/T173684) (owner: 10Urbanecm) [19:08:27] hello niharika [19:10:17] Jayprakash12345_: Now? Purged caching. [19:10:23] I'll be back after lunch. [19:10:59] Ok [19:16:12] Hello Niharika Didi [19:16:45] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:26:12] (03CR) 10Hashar: "on deployment-mediawiki04 there are no rules in iptables and ferm refuses to start with the same reason :D" [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T176314) (owner: 10Hashar) [19:39:01] Hello Niharika Ji [19:45:05] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [19:55:07] 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request access to logstash (nda group) for @framawiki - https://phabricator.wikimedia.org/T176364#3622856 (10RStallman-legalteam) Thank you @Dzahn for the ping. I can create a customized NDA for Framawiki. @Framawiki - I need a couple... [19:56:31] (03CR) 10Herron: [C: 032] Aptrepo: Add --delete flag to rsync-aptrepo cron job [puppet] - 10https://gerrit.wikimedia.org/r/381813 (https://phabricator.wikimedia.org/T177214) (owner: 10Herron) [19:56:44] (03PS2) 10Herron: Aptrepo: Add --delete flag to rsync-aptrepo cron job [puppet] - 10https://gerrit.wikimedia.org/r/381813 (https://phabricator.wikimedia.org/T177214) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: It is that lovely time of the day again! You are hereby commanded to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:06:02] (03PS3) 10Dzahn: Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 (owner: 10Arlolra) [20:09:20] (03PS4) 10Dzahn: testreduce: Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 (owner: 10Arlolra) [20:09:37] (03CR) 10Dzahn: [C: 032] testreduce: Update Parsoid rttest linter config [puppet] - 10https://gerrit.wikimedia.org/r/381238 (owner: 10Arlolra) [20:15:21] 10Operations, 10Patch-For-Review: install2002 free disk space warning - https://phabricator.wikimedia.org/T177214#3652167 (10herron) 05Open>03Resolved a:03herron Much better! ``` install2002:~$ df -h / Filesystem Size Used Avail Use% Mounted on /dev/vda1 78G 49G 26G 66% / ``` [20:18:25] herron: thanks for that fix :) [20:18:39] np! [20:20:07] oh wow [20:20:09] lucky guess :) [20:20:30] thx herron! [20:20:43] haha [20:42:37] (03PS1) 10ArielGlenn: move most hardcoded paths for base dir of misc dumps out to profile [puppet] - 10https://gerrit.wikimedia.org/r/381838 (https://phabricator.wikimedia.org/T175528) [20:48:59] really? the one changeset I do when legitimately half-asleep and jenkins gives it the thumbs up? [20:49:08] * apergos resolves to do all changesets when half-asleep [20:49:52] (03Restored) 10Hashar: (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 (owner: 10Hashar) [20:50:01] (03PS4) 10Hashar: (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 [20:50:32] 10Operations: Upgrade puppetDB to version 3.2 or newer - https://phabricator.wikimedia.org/T177253#3652246 (10herron) [20:54:48] (03PS1) 10Dzahn: remove endowment.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/381866 (https://phabricator.wikimedia.org/T136735) [21:00:04] dapatrick, bawolff, and Reedy: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T2100). [21:00:04] No GERRIT patches in the queue for this window AFAICS. [21:04:30] 10Operations: Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3652273 (10herron) [21:28:18] (03PS2) 10ArielGlenn: move most hardcoded paths for base dir of misc dumps out to profile [puppet] - 10https://gerrit.wikimedia.org/r/381838 (https://phabricator.wikimedia.org/T175528) [21:52:33] (03PS1) 10Eevans: restbase-ng: Collect type=Table (not ColumnFamily) [puppet] - 10https://gerrit.wikimedia.org/r/381884 (https://phabricator.wikimedia.org/T169936) [21:58:01] (03CR) 10Eevans: [C: 031] "This is safe to apply at any time, and without coordination. It is only applicable to the new restbase-ng cluster (not yet in production)" [puppet] - 10https://gerrit.wikimedia.org/r/381884 (https://phabricator.wikimedia.org/T169936) (owner: 10Eevans) [22:03:46] (03PS1) 10Madhuvishy: toolschecker: Fix labsdb1005 test to use a current DB [puppet] - 10https://gerrit.wikimedia.org/r/381885 (https://phabricator.wikimedia.org/T177103) [22:06:34] (03CR) 10Madhuvishy: [C: 032] toolschecker: Fix labsdb1005 test to use a current DB [puppet] - 10https://gerrit.wikimedia.org/r/381885 (https://phabricator.wikimedia.org/T177103) (owner: 10Madhuvishy) [22:28:05] RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:31:14] (03PS1) 10Madhuvishy: toolschecker: Fix sudo options in webservice start command [puppet] - 10https://gerrit.wikimedia.org/r/381887 (https://phabricator.wikimedia.org/T177103) [22:32:05] (03CR) 10Madhuvishy: [C: 032] toolschecker: Fix sudo options in webservice start command [puppet] - 10https://gerrit.wikimedia.org/r/381887 (https://phabricator.wikimedia.org/T177103) (owner: 10Madhuvishy) [22:37:11] (03PS1) 10Madhuvishy: toolschecker: Fix sudo options typo in more places [puppet] - 10https://gerrit.wikimedia.org/r/381889 (https://phabricator.wikimedia.org/T177103) [22:37:48] (03CR) 10Madhuvishy: [C: 032] toolschecker: Fix sudo options typo in more places [puppet] - 10https://gerrit.wikimedia.org/r/381889 (https://phabricator.wikimedia.org/T177103) (owner: 10Madhuvishy) [22:45:53] (03PS1) 10Madhuvishy: toolschecker: Remove uwsgi-python check from grid webservice test [puppet] - 10https://gerrit.wikimedia.org/r/381891 (https://phabricator.wikimedia.org/T177103) [22:46:23] (03CR) 10Madhuvishy: [C: 032] toolschecker: Remove uwsgi-python check from grid webservice test [puppet] - 10https://gerrit.wikimedia.org/r/381891 (https://phabricator.wikimedia.org/T177103) (owner: 10Madhuvishy) [22:56:13] !log created newsletter tables on officewiki T176199 [22:56:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:19] T176199: Enable Newsletter extension in office.wikimedia.org (and Structured Discussions in its related Talk namespace) - https://phabricator.wikimedia.org/T176199 [23:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171002T2300). [23:00:05] James_F and RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:52] !log releases1001 - remove and reinstall jenkins [23:00:54] I can do it [23:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:01:14] Thanks RoanKattouw. [23:02:45] (03CR) 10Catrope: [C: 032] Update WMF's address [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372824 (https://phabricator.wikimedia.org/T173684) (owner: 10Urbanecm) [23:04:36] (03Merged) 10jenkins-bot: Update WMF's address [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372824 (https://phabricator.wikimedia.org/T173684) (owner: 10Urbanecm) [23:06:18] (03CR) 10jenkins-bot: Update WMF's address [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372824 (https://phabricator.wikimedia.org/T173684) (owner: 10Urbanecm) [23:06:53] (03PS2) 10Dzahn: remove endowment.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/381866 (https://phabricator.wikimedia.org/T136735) [23:07:02] OK the address change is on mwdebug1002, not sure if that's testable? [23:07:21] James_F: Perhaps I should edit a page that you watch from 1002? [23:07:24] send a post card to it:) [23:07:45] (03PS2) 10Catrope: Enable jQuery 3 on Wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379947 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [23:07:52] (03CR) 10Catrope: [C: 032] Enable jQuery 3 on Wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379947 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [23:09:02] (03CR) 10Dzahn: [C: 032] remove endowment.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/381866 (https://phabricator.wikimedia.org/T136735) (owner: 10Dzahn) [23:11:18] Krinkle: You around to validate the jQ3 change? [23:11:33] Sure [23:12:03] Basically, $.fn.jquery >> 3.2.1 in console, and a "Migrate plugin in effect" log message [23:13:28] OK cool [23:14:26] (03Merged) 10jenkins-bot: Enable jQuery 3 on Wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379947 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [23:16:09] (03CR) 10jenkins-bot: Enable jQuery 3 on Wiktionary sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379947 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [23:19:36] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Update WMF mailing address (T173684) and enable jQuery 3 on Wiktionaries (T124742) (duration: 00m 48s) [23:19:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:43] T173684: Update WMF's office address in mediawiki-config - https://phabricator.wikimedia.org/T173684 [23:19:43] T124742: Upgrade to jQuery 3 - https://phabricator.wikimedia.org/T124742 [23:33:37] !log releases2001 - kill and restart jenkins with systemctl [23:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:31] !log catrope@tin Synchronized php-1.31.0-wmf.1/resources/src/mediawiki.rcfilters/dm/mw.rcfilters.dm.FiltersViewModel.js: T177107 (duration: 00m 47s) [23:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:36] T177107: Highlighting many changes at init time is very slow - https://phabricator.wikimedia.org/T177107 [23:38:05] !log catrope@tin Synchronized php-1.31.0-wmf.1/extensions/VisualEditor/: T177250 (duration: 00m 47s) [23:38:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:10] T177250: VisualEditor timing data broken by using a different target name in some circumstances - https://phabricator.wikimedia.org/T177250 [23:41:23] (03PS1) 10Faidon Liambotis: Remove ELF binaries from across the tree [puppet] - 10https://gerrit.wikimedia.org/r/381901 [23:41:36] 32-bit binaries ftw [23:41:54] (03CR) 10jerkins-bot: [V: 04-1] Remove ELF binaries from across the tree [puppet] - 10https://gerrit.wikimedia.org/r/381901 (owner: 10Faidon Liambotis) [23:42:57] (03PS2) 10Faidon Liambotis: Remove ELF binaries from across the tree [puppet] - 10https://gerrit.wikimedia.org/r/381901 [23:43:10] heh commit-message-validator doesn't like Unicode ("…" in this case) [23:43:43] (03CR) 10Faidon Liambotis: [C: 032] Remove ELF binaries from across the tree [puppet] - 10https://gerrit.wikimedia.org/r/381901 (owner: 10Faidon Liambotis) [23:56:43] (03PS1) 10Dzahn: add releases-jenkins to misc-web cluster [dns] - 10https://gerrit.wikimedia.org/r/381903 (https://phabricator.wikimedia.org/T164030) [23:57:12] paravoid: that's...weird. Also I'm not sure why operations-puppet is running it under python2, it ideally should use python3 [23:58:10] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128) -- not a sane default locale? [23:58:28] and by sane I mean C.utf-8 I suppose [23:59:27] (03CR) 10Dzahn: [C: 04-2] "endowment as a site has been removed, this can be abandoned" [puppet] - 10https://gerrit.wikimedia.org/r/366817 (owner: 10Muehlenhoff)