[00:00:05] !log catrope@tin Finished scap: Need to update i18n for a new Echo message (duration: 23m 08s) [00:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:40] ty RoanKattouw_away [00:14:47] (03CR) 10Gergő Tisza: [C: 031] logging: Configure monolog to output stack traces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236994 (https://phabricator.wikimedia.org/T89169) (owner: 10BryanDavis) [00:19:35] PROBLEM - HHVM processes on mw1154 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:19:35] PROBLEM - salt-minion processes on mw1154 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:20:36] PROBLEM - dhclient process on mw1154 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:21:05] PROBLEM - nutcracker process on mw1154 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:28:19] bd808, you want me to deploy that logging for XMP? [00:28:50] Krenair: I put it up for swat tomorrow morning [00:28:54] ok [00:29:18] I had 2 logging config things [00:44:05] (03PS1) 10Dzahn: mailman: rsync command for exim copy [puppet] - 10https://gerrit.wikimedia.org/r/237001 (https://phabricator.wikimedia.org/T110138) [00:45:56] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1618912 (10Krenair) @RobH: Do you think we should turn this into some sort of regular check? Not sure icinga necessarily would be appropriate, but... [00:46:05] (03CR) 10Dzahn: [C: 032] mailman: rsync command for exim copy [puppet] - 10https://gerrit.wikimedia.org/r/237001 (https://phabricator.wikimedia.org/T110138) (owner: 10Dzahn) [00:49:02] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1618923 (10Dzahn) @Krenair I think in a perfect world it would be detected by jenkins. If somebody touches a file in admin and there is a key in it it would have to compare it to all exist... [00:49:35] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: puppet fail [00:50:45] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1618924 (10Krenair) >>! In T108078#1618923, @Dzahn wrote: > @Krenair I think in a perfect world it would be detected by jenkins. If somebody touches a file in admin and there is a key in i... [00:53:02] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1618928 (10Krenair) That said, I think jenkins might be less annoying in operations/puppet than it is where I'm used to it - i.e. VE-MW (where you can't merge anything without being jenkin... [00:57:28] (03PS1) 10BryanDavis: Backport of D44265: filter_var_array: do not fall back to FILTER_DEFAULT [debs/hhvm] - 10https://gerrit.wikimedia.org/r/237006 (https://phabricator.wikimedia.org/T107677) [00:57:30] (03PS1) 10BryanDavis: Backport of D45165: Limit log message length for unserialize failures [debs/hhvm] - 10https://gerrit.wikimedia.org/r/237007 [01:04:31] (03PS1) 10Dzahn: mailman: rsync for local import [puppet] - 10https://gerrit.wikimedia.org/r/237008 (https://phabricator.wikimedia.org/T110138) [01:05:32] 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1618943 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Sorted now. [01:05:41] (03PS2) 10Dzahn: mailman: rsync for local import [puppet] - 10https://gerrit.wikimedia.org/r/237008 (https://phabricator.wikimedia.org/T110138) [01:06:20] (03CR) 10Dzahn: [C: 032] mailman: rsync for local import [puppet] - 10https://gerrit.wikimedia.org/r/237008 (https://phabricator.wikimedia.org/T110138) (owner: 10Dzahn) [01:07:10] (03PS3) 10Dzahn: lists: lower A[AAA] records to 5M [dns] - 10https://gerrit.wikimedia.org/r/233049 (https://phabricator.wikimedia.org/T110132) (owner: 10John F. Lewis) [01:08:09] (03CR) 10Dzahn: [C: 032] lists: lower A[AAA] records to 5M [dns] - 10https://gerrit.wikimedia.org/r/233049 (https://phabricator.wikimedia.org/T110132) (owner: 10John F. Lewis) [01:10:38] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: lower lists.wikimedia.org TTL to 5 min - https://phabricator.wikimedia.org/T110132#1618951 (10Dzahn) 5Open>3Resolved [01:10:39] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1618952 (10Dzahn) [01:12:41] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1618954 (10Dzahn) [01:13:23] 6operations, 10Wikimedia-Mailing-lists: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1618955 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/237001/ https://gerrit.wikimedia.org/r/#/c/237008/ [01:24:30] (03PS1) 10Dzahn: mailman: import even unknown lists [puppet] - 10https://gerrit.wikimedia.org/r/237011 (https://phabricator.wikimedia.org/T110131) [01:26:32] (03PS2) 10Dzahn: mailman: import even unknown lists [puppet] - 10https://gerrit.wikimedia.org/r/237011 (https://phabricator.wikimedia.org/T110131) [01:26:57] (03CR) 10Dzahn: [C: 032] mailman: import even unknown lists [puppet] - 10https://gerrit.wikimedia.org/r/237011 (https://phabricator.wikimedia.org/T110131) (owner: 10Dzahn) [01:52:27] 6operations, 10Wikimedia-Mailing-lists: test sending individual mails from fermium during migration - https://phabricator.wikimedia.org/T110441#1619041 (10Dzahn) ``` root@fermium:/var/spool/exim4/input# exim -v postmaster@wikimedia.org From: dzahn@wikimedia.org To: dzahn@wikimedia.org Subject: Testing exim on... [01:57:04] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 1 below the confidence bounds [01:58:05] 6operations, 10Wikimedia-Mailing-lists: shutdown sodium, decom - https://phabricator.wikimedia.org/T110142#1619052 (10Dzahn) p:5High>3Normal [02:01:10] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1619053 (10Dzahn) in preparation for this i already deleted a whole bunch of spam from sodium, obvious spam from gTLDs such as .racing, .space and .xyz was espec... [02:11:14] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 1 below the confidence bounds [02:20:05] PROBLEM - HTTPS on cp3033 is CRITICAL: Return code of 110 is out of bounds [02:22:26] RECOVERY - HTTPS on cp3033 is OK: SSLXNN OK - 36 OK [02:23:04] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Puppet has 1 failures [02:23:05] PROBLEM - HTTPS on cp3049 is CRITICAL: Return code of 110 is out of bounds [02:23:05] PROBLEM - HTTPS on cp3015 is CRITICAL: Return code of 110 is out of bounds [02:25:04] RECOVERY - HTTPS on cp3049 is OK: SSLXNN OK - 36 OK [02:25:14] RECOVERY - HTTPS on cp3015 is OK: SSLXNN OK - 36 OK [02:26:15] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: puppet fail [02:28:18] !log l10nupdate@tin Synchronized php-1.26wmf21/cache/l10n: l10nupdate for 1.26wmf21 (duration: 06m 44s) [02:28:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:30:58] (03PS1) 10Dzahn: mailman: use rsync compression for archive transfer [puppet] - 10https://gerrit.wikimedia.org/r/237013 [02:31:50] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf21) at 2015-09-09 02:31:50+00:00 [02:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:45] 6operations, 6Discovery, 7Elasticsearch, 7Epic: EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) - https://phabricator.wikimedia.org/T109089#1619199 (10Deskana) [02:40:46] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [02:42:46] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15150 bytes in 0.012 second response time [02:49:14] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [02:52:25] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:52:37] !log l10nupdate@tin Synchronized php-1.26wmf22/cache/l10n: l10nupdate for 1.26wmf22 (duration: 05m 34s) [02:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:55:24] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf22) at 2015-09-09 02:55:24+00:00 [02:55:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:17:57] (03PS2) 10Dzahn: mailman: use rsync compression for archive transfer [puppet] - 10https://gerrit.wikimedia.org/r/237013 [03:19:05] (03CR) 10Dzahn: [C: 032] mailman: use rsync compression for archive transfer [puppet] - 10https://gerrit.wikimedia.org/r/237013 (owner: 10Dzahn) [03:24:35] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 8.33% of data above the critical threshold [500.0] [03:28:08] (03PS3) 10Dzahn: admin: add user for addshore [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) [03:30:46] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:35:45] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [03:50:36] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests, 5Patch-For-Review, 7user-notice: Rename "be-x-old" to "be-tarask" - https://phabricator.wikimedia.org/T11823#1619265 (10Legoktm) >>! In T11823#1613434, @Elitre wrote: > If this is fixed, I think there should be user notice of the change.... [04:47:05] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [04:47:45] PROBLEM - Restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [04:56:25] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 10.71% of data above the critical threshold [100000000.0] [05:08:44] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 10.71% of data above the critical threshold [100000000.0] [05:11:28] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Sep 9 05:11:28 UTC 2015 (duration 11m 27s) [05:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:40:45] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:42:51] (03PS1) 10Florianschmidtwelzow: Run suggested search query in wmf wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237029 (https://phabricator.wikimedia.org/T105202) [06:02:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds [06:12:45] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [06:13:59] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1619382 (10ArielGlenn) wrong versions have been fixed. now dealing with bad minion count. [06:20:44] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [06:26:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [06:27:54] (03PS4) 10Muehlenhoff: mediawiki jobrunner: mark as notrack [puppet] - 10https://gerrit.wikimedia.org/r/236767 [06:28:45] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [06:30:54] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:35] PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:39] (03CR) 10Muehlenhoff: [C: 032 V: 032] mediawiki jobrunner: mark as notrack [puppet] - 10https://gerrit.wikimedia.org/r/236767 (owner: 10Muehlenhoff) [06:33:15] PROBLEM - puppet last run on mw2050 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:35] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:45] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:23] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1619404 (10MoritzMuehlenhoff) >>! In T99213#1616465, @ArielGlenn wrote: > I have a lame script for this stuff, well a few, and I should stick them somewhere others can steal pieces... [06:53:56] 6operations, 10Traffic: Deprecate pybal SSH health checks - https://phabricator.wikimedia.org/T111899#1619467 (10MoritzMuehlenhoff) 3NEW [06:55:37] (03Abandoned) 10Muehlenhoff: Add definitions for LVSes in codfw [puppet] - 10https://gerrit.wikimedia.org/r/236519 (owner: 10Muehlenhoff) [06:55:52] (03Abandoned) 10Muehlenhoff: Add definitions for LVSes in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/236765 (owner: 10Muehlenhoff) [06:56:55] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:56:56] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 9 below the confidence bounds [07:05:14] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [07:11:45] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [07:13:05] RECOVERY - Host mw1085 is UP: PING WARNING - Packet loss = 66%, RTA = 0.94 ms [07:17:10] (03PS1) 10Yuvipanda: k8s: Switch to using token auth instead of basic auth [puppet] - 10https://gerrit.wikimedia.org/r/237043 [07:17:17] (03CR) 10jenkins-bot: [V: 04-1] k8s: Switch to using token auth instead of basic auth [puppet] - 10https://gerrit.wikimedia.org/r/237043 (owner: 10Yuvipanda) [07:17:52] (03PS2) 10Yuvipanda: k8s: Switch to using token auth instead of basic auth [puppet] - 10https://gerrit.wikimedia.org/r/237043 [07:18:30] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Switch to using token auth instead of basic auth [puppet] - 10https://gerrit.wikimedia.org/r/237043 (owner: 10Yuvipanda) [07:25:44] RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [07:27:25] RECOVERY - puppet last run on mw2050 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [07:27:46] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:27:55] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:32:35] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [07:36:16] argh [07:36:20] good morning! [07:36:48] aarrggh to you too, hashar [07:37:19] I could use the puppet log from labnodepool1001.eqiad.wmnet , seems using the provider trebuchet failed something ( https://gerrit.wikimedia.org/r/#/c/236769/3/manifests/role/nodepool.pp,unified ) :( [07:37:38] ssh root@labnodepool1001.eqiad.wmnet tail -n 100 /var/log/puppet.log [07:38:06] * YuviPanda is in an airport waiting in line to board [07:38:09] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch: Please backport ElasticSearch 1.7.x from wikimedia-trusty to wikimedia-precise for CI needs - https://phabricator.wikimedia.org/T111781#1619653 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [07:38:25] (03PS2) 10Smalyshev: Create real URIs for wikidata RDF URIs [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) [07:42:35] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:43:56] (03CR) 10Addshore: "The key is new and will only be used here." [puppet] - 10https://gerrit.wikimedia.org/r/236793 (https://phabricator.wikimedia.org/T111756) (owner: 10Dzahn) [07:48:35] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 36.36% of data above the critical threshold [500.0] [07:56:35] (03PS1) 10Hashar: contint: tweak Icinga contact group for prod servers [puppet] - 10https://gerrit.wikimedia.org/r/237045 [07:56:43] 6operations, 6Performance-Team: New URL scheme for service-generated thumbnails - https://phabricator.wikimedia.org/T111048#1619709 (10Gilles) [07:57:26] (03CR) 10Hashar: "That should poke me by email and the releng IRC channel whenever something goes wrong. admins is still around." [puppet] - 10https://gerrit.wikimedia.org/r/237045 (owner: 10Hashar) [08:10:45] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:12:16] 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch, 5Patch-For-Review: elasticsearch 1.6.0 fails to start after reboot - https://phabricator.wikimedia.org/T109497#1619752 (10MoritzMuehlenhoff) [08:12:17] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch: Please backport ElasticSearch 1.7.x from wikimedia-trusty to wikimedia-precise for CI needs - https://phabricator.wikimedia.org/T111781#1619750 (10MoritzMuehlenhoff) 5Open>3Resolved elasticsearch... [08:16:16] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch: Please backport ElasticSearch 1.7.x from wikimedia-trusty to wikimedia-precise for CI needs - https://phabricator.wikimedia.org/T111781#1619767 (10hashar) Thank you very much, will get it upgraded on... [08:21:07] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1619779 (10ArielGlenn) bad minion count instances are done with one exception, wbdocs.scrumbugz.eqiad.wmflabs which hangs on ssh. scripting up the key regeneration now. Note tha... [08:26:40] (03PS1) 10MarcoAurelio: Enabling 'flood' flag at scowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) [08:27:27] 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch, 5Patch-For-Review: elasticsearch 1.6.0 fails to start after reboot - https://phabricator.wikimedia.org/T109497#1619800 (10hashar) Upgraded them: ``` root@integration-saltmaster:~# salt '*precise*' pkg.install elasticsearch... [08:27:44] PROBLEM - salt-minion processes on es1018 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:27:58] PROBLEM - salt-minion processes on kafka1022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:28:44] PROBLEM - salt-minion processes on labsdb1004 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:31:59] (03Abandoned) 10Hashar: elasticsearch: ensure /var/run subdir exists [puppet] - 10https://gerrit.wikimedia.org/r/233413 (https://phabricator.wikimedia.org/T109497) (owner: 10Hashar) [08:33:37] (03CR) 10Filippo Giunchedi: [C: 04-1] librenms - enable LDAP auth (WIP) (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229299 (https://phabricator.wikimedia.org/T107702) (owner: 10Dzahn) [08:33:45] (03PS1) 10Alexandros Kosiaris: akosiaris: htoprc cpu_count_from_zero=1 [puppet] - 10https://gerrit.wikimedia.org/r/237051 [08:33:55] 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch, 5Patch-For-Review: elasticsearch 1.6.0 fails to start after reboot - https://phabricator.wikimedia.org/T109497#1619815 (10hashar) I have rebooted integration-slave-precise1014 and it came back. Then I removed the Gerrit ch... [08:34:18] matt_flaschen, if you want me to do a task and not miss it, please do one of the following: assign the task to me, if it is clear it is me who is going to do it and we already talk about it in the past [08:35:00] !log installed spice security updates on labvirt*, ganeti* and labnodepool1001 [08:35:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:35:07] (03CR) 10Filippo Giunchedi: [C: 031] admin: add kartik to apertium-admins [puppet] - 10https://gerrit.wikimedia.org/r/235854 (https://phabricator.wikimedia.org/T111360) (owner: 10Dzahn) [08:35:07] 6operations, 10Continuous-Integration-Infrastructure, 6Discovery, 7Elasticsearch, 5Patch-For-Review: elasticsearch 1.6.0 fails to start after reboot - https://phabricator.wikimedia.org/T109497#1619823 (10hashar) 5Open>3Resolved a:3hashar Actually ElasticSearch is started and the machines reboot jus... [08:35:08] matt_flaschen, add at least the operations project to the task, blocked-by-operations if you really are blocked by it [08:35:49] (03PS2) 10Alexandros Kosiaris: akosiaris: htoprc cpu_count_from_zero=1 [puppet] - 10https://gerrit.wikimedia.org/r/237051 [08:35:53] matt_flaschen, or 3, lobby for a new coordionato workflow as proposed here: https://wikitech.wikimedia.org/wiki/Schema_changes/Coordination_proposal [08:36:06] RECOVERY - salt-minion processes on kafka1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:36:07] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] akosiaris: htoprc cpu_count_from_zero=1 [puppet] - 10https://gerrit.wikimedia.org/r/237051 (owner: 10Alexandros Kosiaris) [08:39:55] RECOVERY - salt-minion processes on es1018 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:40:33] (03PS1) 10Alexandros Kosiaris: apertium: fix logrotate bug caused by wildcards [puppet] - 10https://gerrit.wikimedia.org/r/237054 [08:40:54] RECOVERY - salt-minion processes on labsdb1004 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:42:33] (03PS1) 10Filippo Giunchedi: install_server: don't prompt for partitions without a partman method [puppet] - 10https://gerrit.wikimedia.org/r/237055 (https://phabricator.wikimedia.org/T111080) [08:42:36] (03CR) 10Alexandros Kosiaris: [C: 032] apertium: fix logrotate bug caused by wildcards [puppet] - 10https://gerrit.wikimedia.org/r/237054 (owner: 10Alexandros Kosiaris) [08:43:17] (03PS2) 10Filippo Giunchedi: install_server: don't prompt for partitions without a partman method [puppet] - 10https://gerrit.wikimedia.org/r/237055 (https://phabricator.wikimedia.org/T111080) [08:43:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: don't prompt for partitions without a partman method [puppet] - 10https://gerrit.wikimedia.org/r/237055 (https://phabricator.wikimedia.org/T111080) (owner: 10Filippo Giunchedi) [08:55:29] (03PS1) 10MarcoAurelio: Enable Extension:GuidedTour on srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237056 (https://phabricator.wikimedia.org/T107862) [09:00:04] aude: Respected human, time to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T0900). Please do the needful. [09:04:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] "inline comments. This probably needs some experimentation as it is not clear by the documentation listed what all those settings do." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229299 (https://phabricator.wikimedia.org/T107702) (owner: 10Dzahn) [09:08:06] (03PS3) 10DCausse: Disable dynamic scripting in Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:08:14] (03CR) 10jenkins-bot: [V: 04-1] Disable dynamic scripting in Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:13:29] (03PS4) 10DCausse: Disable dynamic scripting in Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:18:53] (03CR) 10DCausse: [C: 04-1] "This should not be merged before Ibeb087afca1c3daa8467792b428bbeb76dfc9c79 (requires extra plugin v1.7.1)" [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:23:10] (03PS5) 10DCausse: Disable dynamic scripting in Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:25:03] (03CR) 10DCausse: [C: 04-1] "This should not be merged before Ibeb087afca1c3daa8467792b428bbeb76dfc9c79 (requires extra plugin v1.7.1)" [puppet] - 10https://gerrit.wikimedia.org/r/224651 (owner: 10Manybubbles) [09:29:23] (03PS1) 10Hashar: Revert "nodepool: setup scripts are in integration/config" [puppet] - 10https://gerrit.wikimedia.org/r/237062 [09:29:30] (03PS2) 10Hashar: Revert "nodepool: setup scripts are in integration/config" [puppet] - 10https://gerrit.wikimedia.org/r/237062 [09:33:13] (03PS3) 10Hashar: Revert "nodepool: setup scripts are in integration/config" [puppet] - 10https://gerrit.wikimedia.org/r/237062 (https://phabricator.wikimedia.org/T111925) [09:35:22] (03PS1) 10Aude: Enable usage tracking for Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237063 (https://phabricator.wikimedia.org/T111142) [09:36:18] (03PS2) 10Muehlenhoff: Enable initial api appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/236771 [09:37:23] (03PS1) 10MarcoAurelio: Change wgSitename, wgMetaNamespace and wgMetaNamespaceTalk for srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237064 (https://phabricator.wikimedia.org/T111247) [09:39:03] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable initial api appserver in codfw [puppet] - 10https://gerrit.wikimedia.org/r/236771 (owner: 10Muehlenhoff) [09:40:37] (03CR) 10Aude: [C: 032] Enable usage tracking for Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237063 (https://phabricator.wikimedia.org/T111142) (owner: 10Aude) [09:40:43] (03Merged) 10jenkins-bot: Enable usage tracking for Wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237063 (https://phabricator.wikimedia.org/T111142) (owner: 10Aude) [09:40:43] * aude deploys [09:41:29] !log aude@tin Synchronized usagetracking.dblist: Enable usage tracking on Wikinews (duration: 00m 12s) [09:41:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:42:10] (03PS1) 10Hashar: nodepool: setup scripts are in integration/config [puppet] - 10https://gerrit.wikimedia.org/r/237065 (https://phabricator.wikimedia.org/T111925) [09:43:01] (03CR) 10jenkins-bot: [V: 04-1] nodepool: setup scripts are in integration/config [puppet] - 10https://gerrit.wikimedia.org/r/237065 (https://phabricator.wikimedia.org/T111925) (owner: 10Hashar) [09:44:05] (03PS2) 10Hashar: nodepool: setup scripts are in integration/config [puppet] - 10https://gerrit.wikimedia.org/r/237065 (https://phabricator.wikimedia.org/T111925) [09:55:14] (03PS1) 10Muehlenhoff: Enable ferm on mw1017 (test.wikipedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/237068 [09:59:15] 6operations, 7Database: Puppetize grants for mysql backups on dbstore hosts - https://phabricator.wikimedia.org/T111929#1620069 (10jcrespo) 3NEW a:3jcrespo [09:59:58] (03PS1) 10Muehlenhoff: Enable ferm on initial jobrunner in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237069 [10:00:45] 6operations, 7Database: Puppetize grants for mysql backups on dbstore hosts - https://phabricator.wikimedia.org/T111929#1620077 (10jcrespo) This would be a super-simple task, if it wasn't because grants puppetization is very lacking, and I have to refactor it completely even for such a simple task. [10:08:56] (03CR) 10Hashar: "From T111374 , will be talked about in next ops meeting on Monday Sep. 14th." [puppet] - 10https://gerrit.wikimedia.org/r/235742 (https://phabricator.wikimedia.org/T111374) (owner: 10Hashar) [10:14:35] * aude deploying more [10:14:49] (03PS1) 10Aude: Sort wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237072 [10:14:51] (03PS1) 10Aude: Enable usage tracking on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237073 [10:14:53] (03PS1) 10Aude: Enable usage tracking on test2.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237074 [10:18:52] (03PS4) 10Hashar: (WIP) admin: support members aliasing (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/234554 [10:19:41] (03PS1) 10Aude: Remove usagetracking.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237075 [10:20:00] (03CR) 10Aude: [C: 032] Sort wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237072 (owner: 10Aude) [10:20:08] (03Merged) 10jenkins-bot: Sort wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237072 (owner: 10Aude) [10:20:18] (03CR) 10Aude: [C: 032] Enable usage tracking on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237073 (owner: 10Aude) [10:20:24] (03Merged) 10jenkins-bot: Enable usage tracking on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237073 (owner: 10Aude) [10:20:27] (03CR) 10Aude: [C: 032] Enable usage tracking on test2.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237074 (owner: 10Aude) [10:20:33] (03Merged) 10jenkins-bot: Enable usage tracking on test2.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237074 (owner: 10Aude) [10:21:20] !log aude@tin Synchronized wikidataclient.dblist: Sorted dblist (duration: 00m 12s) [10:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:21:49] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on initial jobrunner in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237069 (owner: 10Muehlenhoff) [10:23:00] !log aude@tin Synchronized usagetracking.dblist: Enable usage tracking on commons and test2wiki (duration: 00m 11s) [10:23:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:23:23] (03CR) 10Aude: [C: 032] Remove usagetracking.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237075 (owner: 10Aude) [10:23:30] (03Merged) 10jenkins-bot: Remove usagetracking.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237075 (owner: 10Aude) [10:26:15] !log aude@tin Synchronized wmf-config/InitialiseSettings.php: rv usage tracking (duration: 00m 12s) [10:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:26:34] just a notice, but still need to fix [10:29:12] (03PS1) 10Aude: Remove legacy usage setting from Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237076 [10:30:00] (03CR) 10Aude: [C: 032] Remove legacy usage setting from Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237076 (owner: 10Aude) [10:30:10] (03Merged) 10jenkins-bot: Remove legacy usage setting from Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237076 (owner: 10Aude) [10:30:46] !log aude@tin Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 12s) [10:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:31:23] better [10:33:04] !log aude@tin Synchronized wmf-config/CommonSettings.php: Remove unused usagetracking tag (duration: 00m 11s) [10:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:33:51] hmmm, how to sync removing of a dblist file? [10:34:02] PROBLEM - puppet last run on db2062 is CRITICAL: CRITICAL: puppet fail [10:36:14] (03PS2) 10Hashar: debian: fix lintian error about bad dist name [software/conftool] - 10https://gerrit.wikimedia.org/r/226910 [10:36:26] (03CR) 10jenkins-bot: [V: 04-1] debian: fix lintian error about bad dist name [software/conftool] - 10https://gerrit.wikimedia.org/r/226910 (owner: 10Hashar) [10:37:10] (03PS1) 10Muehlenhoff: Disable ferm on mw2080, needs additional rules for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/237077 [10:37:19] 10Ops-Access-Requests, 6operations: Requesting access to stat1003, stat1002 and bast1001 for JMinor - https://phabricator.wikimedia.org/T111872#1620136 (10jcrespo) p:5Triage>3Normal Hello, Joshuua, Pleased to meet you. jkatz current has access to the following groups: ``` bastiononly, researchers, stat... [10:38:43] (03CR) 10Muehlenhoff: [C: 032 V: 032] Disable ferm on mw2080, needs additional rules for rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/237077 (owner: 10Muehlenhoff) [10:39:05] 10Ops-Access-Requests, 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Let contint-admins force run puppet with /usr/local/sbin/puppet-run - https://phabricator.wikimedia.org/T110943#1620139 (10hashar) Apparently the ops meeting ran out of time. So should be talked about again in t... [10:39:11] (03CR) 10Hashar: "Apparently the ops meeting ran out of time. So should be talked about again in the next ops meeting on Monday Sep. 25th." [puppet] - 10https://gerrit.wikimedia.org/r/234539 (https://phabricator.wikimedia.org/T110943) (owner: 10Hashar) [10:41:46] could use a revert patch to unbreak puppet on labnodepool1001 : https://gerrit.wikimedia.org/r/#/c/237062/ :D [10:42:15] moritzm: patch above would get rid of the git-deploy/trebuchet error from this morning if you can land it in [10:42:29] * aude thinks removing dblist might work with scap, but otherwise not sure and maybe doesn't need to be done now [10:42:36] I am switching to plain git::clone and manualy deploy instead ( https://gerrit.wikimedia.org/r/#/c/237065/ ) [10:44:29] 6operations, 7Database: Puppetize grants for mysql backups on dbstore hosts - https://phabricator.wikimedia.org/T111929#1620166 (10jcrespo) p:5Triage>3Normal [10:46:03] 6operations: Nutcracker stats monitoring should only listen on localhost - https://phabricator.wikimedia.org/T111934#1620173 (10MoritzMuehlenhoff) 3NEW [10:47:24] 6operations, 7Database: Upgrade db1022, which has an older kernel - https://phabricator.wikimedia.org/T101516#1620185 (10jcrespo) [10:48:33] (03PS1) 10Milimetric: [WIP] Add dashiki module and role [puppet] - 10https://gerrit.wikimedia.org/r/237079 (https://phabricator.wikimedia.org/T110351) [10:49:27] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Add dashiki module and role [puppet] - 10https://gerrit.wikimedia.org/r/237079 (https://phabricator.wikimedia.org/T110351) (owner: 10Milimetric) [10:52:51] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15150 bytes in 0.014 second response time [10:53:16] (03CR) 10Daniel Kinzler: [C: 031] "The rools look good to me, I checked them against RdfVocabulary.php. No idea whether this is the right place or way to declare them though" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) (owner: 10Smalyshev) [10:53:40] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [10:58:22] (03CR) 10JanZerebecki: [C: 04-1] "This is the right place and way. Address Daniels comment. Add puppet code to put wikidata-uris.incl into the right place, see git grep api" [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) (owner: 10Smalyshev) [11:00:20] RECOVERY - puppet last run on db2062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:47] (03CR) 10JanZerebecki: Create real URIs for wikidata RDF URIs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) (owner: 10Smalyshev) [11:10:48] akosiaris: on https://gerrit.wikimedia.org/r/#/c/231574/3/modules/restbase/templates/config.analytics.yaml.erb under "storage_groups", you say let's discuss [11:11:39] milimetric: yes [11:11:51] so these are the wikis restbase is proxying for, no ? [11:11:51] wanna talk here or you busy? [11:12:42] sure, let's talk [11:12:59] uh... I think these are the storage groups that cassandra is allowed to set up... not sure :) [11:13:55] I hadn't looked at this section honestly, just copied it. It seems to work for us regardless of what domain we tell cassandra we want [11:14:08] but it seems wrong, yea [11:14:38] maybe I'd just delete the first two and leave the catch-all with a TODO to talk to gabriel? [11:14:39] the entire section is being used via a yaml reference in line 92 [11:14:48] and not sure what that does [11:14:57] yeah sure [11:15:39] it seems to power the /{api:sys}: path ? [11:15:41] whatever that is [11:15:57] k, thx, I'll push my replies to all your other comments, but let me talk to joseph too and have him check before we bother you again [11:16:23] ok thanks [11:17:39] (03PS4) 10Milimetric: [WIP] Add an Analytics specific instance of RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) [11:17:46] (03CR) 10Milimetric: [WIP] Add an Analytics specific instance of RESTBase (0317 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [11:19:06] (03CR) 10Steinsplitter: "wikilove database needs to be crated before merging." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235907 (https://phabricator.wikimedia.org/T106264) (owner: 10MarcoAurelio) [11:19:16] (03CR) 10Steinsplitter: [C: 031] Enabling extension WikiLove for outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235907 (https://phabricator.wikimedia.org/T106264) (owner: 10MarcoAurelio) [11:19:20] 6operations, 10ops-codfw, 5Patch-For-Review: rack & initial setup of elastic2001-2024 - https://phabricator.wikimedia.org/T111080#1620303 (10fgiunchedi) done: * switch port configuration for each machine the following is needed: * set bios boot mode to legacy from the boot menu (doesn't seem possible via ss... [11:40:09] akosiaris: milimetric: the storage groups are used to separate different domains into different keyspaces [11:40:34] too many keyspaces - bad for cassandra, idem for only one keyspace with a gazillion domains [11:42:17] I'd propose to keep them, but that really depends on the amount of data and traffic per domain [11:42:55] thanks mobrovac. Yeah, I guess it depends on how we do our initial domain setup too [11:43:40] milimetric: exactly [11:43:52] (03PS1) 10Muehlenhoff: Add a ferm define for mw_appserver_networks (needed for scap::proxy) [puppet] - 10https://gerrit.wikimedia.org/r/237085 [11:44:03] depends if you are going to set up your RB instance with only one domain or with different ones [11:46:49] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1620387 (10mmodell) [11:48:44] 6operations, 6Labs, 10Salt: salt does not run reliably for toollabs / labs generally - https://phabricator.wikimedia.org/T99213#1620396 (10ArielGlenn) key regen completed, did another round of deleting keys of deleted instances, now checking on minions with authentication errors in the logs. [11:49:57] (03PS2) 10Muehlenhoff: Add a ferm define for mw_appserver_networks (needed for scap::proxy) [puppet] - 10https://gerrit.wikimedia.org/r/237085 [11:56:11] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [11:57:21] (03PS1) 10Muehlenhoff: Add ferm rules for scap proxy [puppet] - 10https://gerrit.wikimedia.org/r/237087 [12:01:26] (03PS1) 10Muehlenhoff: Enable ferm on initial jobrunner in codfw (mw2081) [puppet] - 10https://gerrit.wikimedia.org/r/237088 [12:04:21] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:08:42] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on initial jobrunner in codfw (mw2081) [puppet] - 10https://gerrit.wikimedia.org/r/237088 (owner: 10Muehlenhoff) [12:18:54] (03PS1) 10Merlijn van Deen: toolserver: do not escape when redirecting from http to https [puppet] - 10https://gerrit.wikimedia.org/r/237089 (https://phabricator.wikimedia.org/T111839) [12:33:04] (03PS1) 10Muehlenhoff: Enable initial imagescaler in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237093 [12:38:21] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable initial imagescaler in codfw [puppet] - 10https://gerrit.wikimedia.org/r/237093 (owner: 10Muehlenhoff) [12:48:31] 10Ops-Access-Requests, 6operations: Requesting access to stat1003, stat1002 and bast1001 for JMinor - https://phabricator.wikimedia.org/T111872#1620502 (10Ottomata) These are the correct groups! :) Ok with me. [12:48:43] (03PS2) 10coren: toolserver: do not escape when redirecting from http to https [puppet] - 10https://gerrit.wikimedia.org/r/237089 (https://phabricator.wikimedia.org/T111839) (owner: 10Merlijn van Deen) [12:50:06] (03CR) 10coren: [C: 032] "This should fix it." [puppet] - 10https://gerrit.wikimedia.org/r/237089 (https://phabricator.wikimedia.org/T111839) (owner: 10Merlijn van Deen) [12:56:35] (03CR) 10Joal: [C: 031] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [12:56:55] hey akosiaris [12:57:26] we (milimetric and I) have reviewed the puppet changes for analytics [12:57:46] Let me know if there is anything else I can do to help [12:57:54] akosiaris: --^ Thx ! [13:02:25] !log enabled ferm on various initial mediawiki hosts in codfw: videoscaler (mw2007), appserver (mw200[89]), jobrunner (mw2081), api (mw2050), imagescaler (mw2086) [13:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:05:00] (03PS2) 10Muehlenhoff: Enable ferm on mw1017 (test.wikipedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/237068 [13:05:20] !log issuing Cassandra repair on restbase1001 (nodetool repair -pr) [13:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:08:23] (03PS1) 1020after4: SSH repo hosting support for phabricator. [puppet] - 10https://gerrit.wikimedia.org/r/237096 (https://phabricator.wikimedia.org/T128) [13:09:09] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable ferm on mw1017 (test.wikipedia.org) [puppet] - 10https://gerrit.wikimedia.org/r/237068 (owner: 10Muehlenhoff) [13:10:10] (03PS2) 1020after4: phab: use mysql slave not master for scripts [puppet] - 10https://gerrit.wikimedia.org/r/236944 (https://phabricator.wikimedia.org/T111547) (owner: 10Dzahn) [13:10:24] (03CR) 1020after4: [C: 031] phab: use mysql slave not master for scripts [puppet] - 10https://gerrit.wikimedia.org/r/236944 (https://phabricator.wikimedia.org/T111547) (owner: 10Dzahn) [13:13:41] (03CR) 1020after4: "Should we go ahead and deploy this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234425 (https://phabricator.wikimedia.org/T89532) (owner: 10BryanDavis) [13:14:01] !log enabled ferm on test.wikipedia.org (mw1017) [13:14:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:31] (03CR) 1020after4: "I can't quite understand how it would be caused by ipv6 myself. It specifically happens with the sync-dir and sync-file subcommands, but n" [tools/scap] - 10https://gerrit.wikimedia.org/r/234687 (owner: 10BryanDavis) [13:17:43] (03PS4) 10Andrew Bogott: Revert "nodepool: setup scripts are in integration/config" [puppet] - 10https://gerrit.wikimedia.org/r/237062 (https://phabricator.wikimedia.org/T111925) (owner: 10Hashar) [13:35:08] (03CR) 10Mobrovac: [C: 04-1] "Looks rather good. Some rather minor comments in-lined." (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [13:39:15] (03CR) 10Mforns: [C: 04-1] "The -1 is related to the first comment, I think it is a typo." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/236947 (https://phabricator.wikimedia.org/T106254) (owner: 10Ottomata) [13:41:53] 6operations, 6Labs, 5Patch-For-Review: labs salt master on jessie fails to install salt-master - https://phabricator.wikimedia.org/T110032#1620623 (10fgiunchedi) 5Open>3Resolved a:3fgiunchedi this works now, newly provisioned jessie below, thanks @Andrew! ```lines=15 filippo@test-cassandra3:~$ apt-cac... [13:47:31] 10Ops-Access-Requests, 6operations: Add Matanya to "restricted" to perform server side uploads - https://phabricator.wikimedia.org/T106447#1620631 (10jcrespo) 5Open>3stalled Being annoying to @mark again, after 15 days as it is my "duty". [13:47:59] (03CR) 10Anomie: [C: 031] logging: Configure monolog to output stack traces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236994 (https://phabricator.wikimedia.org/T89169) (owner: 10BryanDavis) [13:52:37] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1620647 (10Jgreen) >>! In T110592#1610336, @Ottomata wrote: > Hm, Jeff, both udp-filter and libanon are installed from packages on stat1002, which is a Trusty box... [13:54:32] (03CR) 10Nuria: "Looks good. will update docs." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236985 (owner: 10Ottomata) [13:54:47] 6operations, 10fundraising-tech-ops: package udp-filter for Trusty, for use on fundraising banner_logger - https://phabricator.wikimedia.org/T110592#1620649 (10Ottomata) Ah hm. Ok! Since they work fine there, maybe we can just reprepro copy them into Trusty? [14:00:37] (03PS1) 10Aude: Set entityAccessLimit for WikibaseClient wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237097 [14:01:11] (03CR) 10Andrew Bogott: [C: 032] Revert "nodepool: setup scripts are in integration/config" [puppet] - 10https://gerrit.wikimedia.org/r/237062 (https://phabricator.wikimedia.org/T111925) (owner: 10Hashar) [14:01:42] (03PS3) 10Andrew Bogott: nodepool: setup scripts are in integration/config [puppet] - 10https://gerrit.wikimedia.org/r/237065 (https://phabricator.wikimedia.org/T111925) (owner: 10Hashar) [14:01:59] (03PS2) 10Dzahn: lists: hold mail to lists.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/233750 (https://phabricator.wikimedia.org/T110136) (owner: 10John F. Lewis) [14:02:58] (03CR) 10Andrew Bogott: [C: 032] nodepool: setup scripts are in integration/config [puppet] - 10https://gerrit.wikimedia.org/r/237065 (https://phabricator.wikimedia.org/T111925) (owner: 10Hashar) [14:03:19] !log beginning mailman migration - expect lists to be down [14:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:03:38] (03PS3) 10Dzahn: lists: hold mail to lists.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/233750 (https://phabricator.wikimedia.org/T110136) (owner: 10John F. Lewis) [14:04:35] (03CR) 10Dzahn: [C: 032] lists: hold mail to lists.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/233750 (https://phabricator.wikimedia.org/T110136) (owner: 10John F. Lewis) [14:12:17] \0/ @ mailman [14:12:58] (03PS2) 10Aude: Set entityAccessLimit for WikibaseClient wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237097 [14:14:43] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: hold lists.wikimedia.org with exim - https://phabricator.wikimedia.org/T110136#1620689 (10Dzahn) 5Open>3Resolved [14:14:44] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1620690 (10Dzahn) [14:15:02] mutante: I'm here [14:15:29] paravoid: ok great, thanks [14:15:59] :) [14:17:26] RECOVERY - puppet last run on labnodepool1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:18:45] 6operations, 10Wikimedia-Mailing-lists: shut down mailman on sodium - https://phabricator.wikimedia.org/T110137#1620691 (10Dzahn) 5Open>3Resolved [14:18:46] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1620692 (10Dzahn) [14:23:17] (03PS1) 10Muehlenhoff: Enable ferm on hadoop master [puppet] - 10https://gerrit.wikimedia.org/r/237099 [14:24:10] (03PS1) 10Muehlenhoff: Enablke ferm for hadoop standby [puppet] - 10https://gerrit.wikimedia.org/r/237100 [14:24:45] PROBLEM - Host silicon is DOWN: PING CRITICAL - Packet loss = 100% [14:27:16] mark: any chance to get a comment on https://phabricator.wikimedia.org/T106447 ? [14:27:34] a reject is also a valid response. [14:28:16] PROBLEM - Exim SMTP on sodium is CRITICAL: Connection refused [14:28:30] :) [14:29:56] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [14:30:06] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [14:30:39] I suppose I can ack sodium alerts? [14:30:47] jynus: yeah [14:30:50] planned migration [14:30:58] yes, wanted confirmation [14:31:03] ACKNOWLEDGEMENT - Exim SMTP on sodium is CRITICAL: Connection refused daniel_zahn migration [14:31:03] ACKNOWLEDGEMENT - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl daniel_zahn migration [14:31:03] ACKNOWLEDGEMENT - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner daniel_zahn migration [14:31:12] or mutante can :) [14:31:13] (03PS2) 10Andrew Bogott: Labs: Allow self-hosted puppetmasters to auto-sign certificates [puppet] - 10https://gerrit.wikimedia.org/r/235621 (owner: 10Tim Landscheidt) [14:31:24] I know the rights issue, BTW, JohnFLewis [14:31:37] jynus: ack, i just wanted to see it to confirm [14:31:50] jynus: hm? [14:31:53] perfect, just use me as a tool [14:32:01] if you need me :-) [14:33:13] (03CR) 10Andrew Bogott: [C: 032] Labs: Allow self-hosted puppetmasters to auto-sign certificates [puppet] - 10https://gerrit.wikimedia.org/r/235621 (owner: 10Tim Landscheidt) [14:37:36] jynus: thanks, perfect [14:37:52] we probably should use that more :-) [14:38:25] heh :p [14:39:32] and lets say a mourn for lucid, too [14:41:26] that's the only major positive and motivation here - no lucid :P [14:42:36] the only one? What about the joy of helping millions of people communicate faster and more efficiently? [14:42:59] is it faster and more efficient? :) [14:43:16] legoktm K-Lined? oops [14:43:18] well, that sounds better than maintenance upgrade, so I use the former [14:43:32] the changelog high light is ops can run lists-list and only show public archive lists! :p [14:44:16] nah, there is much more though [14:44:55] https://phabricator.wikimedia.org/T110140#1582771 is the long version [14:44:57] like poster passwords, security, i18n and running software not made in 2008! [14:45:10] so a bunch of bug fixes too [14:45:27] and don't forget better SSL ciphers due to Apache 2.4 [14:45:40] apache 2.4 is also a major improvement indeed [14:51:01] (03PS1) 10Mjbmr: Add Wikijunior and Cookbook namespaces to wgContentNamespaces for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237104 (https://phabricator.wikimedia.org/T76663) [14:54:01] (03PS2) 10Mjbmr: Add Wikijunior and Cookbook namespaces to wgContentNamespaces for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237104 (https://phabricator.wikimedia.org/T76663) [14:54:25] (03PS3) 10Mjbmr: Add Wikijunior and Cookbook namespaces to wgContentNamespaces for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237104 (https://phabricator.wikimedia.org/T76663) [14:55:11] (03PS2) 10Jcrespo: Add perf-admins group and add to relevant roles [puppet] - 10https://gerrit.wikimedia.org/r/236847 (https://phabricator.wikimedia.org/T110926) (owner: 10Ori.livneh) [14:55:40] (03PS6) 10Alex Monk: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T1500). Please do the needful. [15:00:04] aharoni bd808 aude: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:39] Shalom. [15:00:59] Hey [15:01:14] Is SWAT now, or am I confused with timezones? [15:01:40] aharoni: it's now, the bot announced it just before you joined [15:01:46] RECOVERY - Host silicon is UP: PING OK - Packet loss = 0%, RTA = 3.39 ms [15:01:48] lovely [15:01:58] shalom aharoni :-) [15:02:00] Okay, let's see [15:02:03] aharoni: you can query the bot like this: [15:02:06] jouncebot: next [15:02:06] In 0 hour(s) and 57 minute(s): Labs OpenStack upgrade (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T1600) [15:02:06] I have two patches to Babel configuration. [15:02:10] * aude here [15:02:20] i think it needs "current" [15:02:20] although I guess it doesn't tell you about ongoing that just started heh [15:02:21] o/ [15:02:24] jouncebot: current [15:02:29] jouncebot: help [15:02:37] ugh, it notices the channel? [15:02:43] ugh [15:02:44] that's neat heh [15:02:46] lol [15:02:47] did not expect this :p [15:02:55] good job pinging everyone. :P [15:03:01] It used to send all messages as notices [15:03:03] aharoni, so this is fine because those categories are the ones actively in use by the wiki's templates, right? [15:03:14] * bd808 has been meaning to patch that [15:03:16] Yes. [15:03:27] can anyone send the die command to that bot? [15:03:27] I prefer to start from https://gerrit.wikimedia.org/r/#/c/236042/ . [15:03:33] (A site with less users.) [15:03:35] it's good for trolling, thanks [15:03:44] Krenair: yeah [15:03:52] (Good for testing.) [15:04:05] aharoni, okay. in future, list them in the order you want them done in :p [15:04:39] (03CR) 10Alex Monk: [C: 032] Configure $wgBabelCategoryNames for the Ladino Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236042 (owner: 10Amire80) [15:05:07] (03Merged) 10jenkins-bot: Configure $wgBabelCategoryNames for the Ladino Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236042 (owner: 10Amire80) [15:05:22] bd808: do you know how to sync the removal of a dblist file? scap? [15:05:36] * aude doesn't think sync-file would work [15:05:39] (03PS1) 10BBlack: mobile: remove "Temp test" CC header block for icons/gadget [puppet] - 10https://gerrit.wikimedia.org/r/237105 (https://phabricator.wikimedia.org/T109286) [15:05:42] yeah, scap will do it. sync-dir would too [15:05:47] ah [15:06:09] sync-dir of common? [15:06:16] but dblists are in the root of the whole sync tree so that's almost the same as a scap [15:06:17] in know there is sync -common [15:06:22] yeah [15:06:42] I've asked this question probably a couple of times before. Maybe it should be documented :p [15:06:43] at the end of swat, we might want to do that for completeness [15:06:49] Oh, I know what you can do [15:06:53] there's a better way [15:06:56] aude, you can sync only dblists [15:07:00] sync-common is the script that gets run on the MW hosts when sync-*/scap is used on tin [15:07:04] sync-dblist [15:07:07] i know but does that handle removal? [15:07:19] I seem to recall it does [15:07:22] oh [15:07:24] oh yeah I think we tested that at some point [15:07:29] * aude can try it after swat [15:07:59] * aude no longer needs usagetracking.dblist since all wikis (that have wikibase) have the feature now :) [15:08:04] aude, you appear to have left mediawiki-staging in an odd state [15:08:08] oh [15:08:10] how [15:08:21] master is at 17b41cfd5872d942f5064e369cb05f4de217b163 (remove usagetracking.dblist) [15:08:22] * aude looks [15:08:26] ok [15:08:35] yeah, i couldn't sync that [15:08:39] and then you've checked out 1a8ee3357e4ee6bd175a6337a7edf823a2b5eb41 on top of it (Remove legacy usage setting from Wikibase.php) [15:08:48] Krenair: is it supposed to be deployed already? [15:08:49] yeah i synced that [15:08:56] aharoni, no [15:09:03] OK, waiting patiently. [15:09:07] * aude looks [15:09:26] aude, I just sorted it [15:09:47] ok, thanks [15:09:54] * aude wants to do sync dblist at the end of swat [15:10:11] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236042/ (duration: 00m 13s) [15:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:26] aharoni, now it's live [15:10:53] Krenair: fantastic - https://lad.wikipedia.org/wiki/Usador:Amire80 [15:11:01] the categories at the bottom are as expected. [15:11:14] now https://gerrit.wikimedia.org/r/#/c/236025/ [15:11:40] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1620825 (10Dzahn) archives, lists, qfiles and heldmsg's transferred over [15:11:47] (03CR) 10Alex Monk: [C: 032] Configure $wgBabelCategoryNames for the Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236025 (owner: 10Amire80) [15:11:55] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1620828 (10Dzahn) [15:11:56] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1620827 (10Dzahn) 5Open>3Resolved [15:12:16] (03Merged) 10jenkins-bot: Configure $wgBabelCategoryNames for the Hebrew Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236025 (owner: 10Amire80) [15:12:31] * aude recommends to +2 https://gerrit.wikimedia.org/r/#/c/237091/ (allow jenkins to work, while config changes go out) [15:12:50] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236025/ (duration: 00m 11s) [15:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:14:03] 6operations, 10Wikimedia-Mailing-lists: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1620844 (10Dzahn) "input" and "msglog" directories rsynced over to fermium [15:15:10] aharoni, all good? [15:15:17] Krenair: Yep, just checked in Hebrew Wikipedia. [15:15:18] It works. [15:15:25] Thanks, that it's for me. [15:15:53] (03CR) 10Alex Monk: "(Do not merge before September 15th)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236045 (https://phabricator.wikimedia.org/T44894) (owner: 10Greg Grossmeier) [15:15:56] !log Running sync-common manually on mw2187.codfw.wmnet. Host is missing l10n cache files [15:16:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:16:53] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1620865 (10Dzahn) [15:16:55] 6operations, 10Wikimedia-Mailing-lists: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1620863 (10Dzahn) 5Open>3Resolved ran import script on fermium. "mailq" shows pending mails on fermium now. (3824, 3823 on sodium) [15:17:15] RECOVERY - Apache HTTP on mw2187 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.149 second response time [15:17:45] bd808, so when we do https://gerrit.wikimedia.org/r/#/c/236994/1 we should expect hhvm.log to start containing stack traces? [15:17:55] RECOVERY - HHVM rendering on mw2187 is OK: HTTP OK: HTTP/1.1 200 OK - 65981 bytes in 2.000 second response time [15:18:06] not hhvm.log, but fatal.log and exception.log [15:18:10] ah [15:18:15] * aude welcomes that [15:18:39] hhvm.log is from hhvm directly and will take patches upstream [15:18:39] ah right [15:18:47] configuring monolog is fun [15:18:57] which I'd like to get to at some point but haven't tried yet [15:19:21] entries in fatal.log already have stack traces [15:19:21] (03CR) 10Jcrespo: [C: 031] Add perf-admins group and add to relevant roles [puppet] - 10https://gerrit.wikimedia.org/r/236847 (https://phabricator.wikimedia.org/T110926) (owner: 10Ori.livneh) [15:19:51] some times, yes [15:20:12] the new config will start to make prettier traces with this week's branch [15:20:18] (03CR) 10John F. Lewis: [C: 031] mailman: set new settings to improve security [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [15:20:23] \o/ [15:20:36] i miss pretty traces :) [15:20:37] (03CR) 10Alex Monk: [C: 032] logging: Configure monolog to output stack traces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236994 (https://phabricator.wikimedia.org/T89169) (owner: 10BryanDavis) [15:20:50] The traces we get right now are mostly #0 MWExceptionHandler::handleFatalError() [15:20:58] true [15:21:03] (03Merged) 10jenkins-bot: logging: Configure monolog to output stack traces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236994 (https://phabricator.wikimedia.org/T89169) (owner: 10BryanDavis) [15:21:08] the new branch has a fix for that [15:21:10] Krenair: should i +2 my patch for wmf22? [15:21:15] since jenkins usually takes time [15:21:32] aude, you're next in the queue, so go for it [15:21:35] ok [15:21:57] !log krenair@tin Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/236994/ (duration: 00m 12s) [15:21:58] * aude would prefer if i could +2 sometime else, and then not automatically merge the submodule bump in core [15:22:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:22:06] to have it all ready [15:22:21] (03PS2) 10Hashar: nodepool: easily switch to nodepool user [puppet] - 10https://gerrit.wikimedia.org/r/234483 [15:22:21] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1620912 (10BBlack) Latest updates on traffic sampling on the text cluster: ``` root@cp1065:~# grep User-Agent postua5.log|cut -d: -f2-|egrep 'Kindle|Dalvik'|wc -l 395 root@cp1065:~# grep Us... [15:22:53] (03CR) 10Mjbmr: [C: 04-1] "Please add it to wgAddGroups and wgRemoveGroups to be granted/revoked by admins as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) (owner: 10MarcoAurelio) [15:23:53] (03CR) 10Hashar: "I am no more overriding the nodepool user login shell. Instead introduce a convenience wrapper 'become-nodepool' which has the magic sudo" [puppet] - 10https://gerrit.wikimedia.org/r/234483 (owner: 10Hashar) [15:24:07] bd808, everything looking okay to you? [15:24:25] I don't see anything melting, no :) [15:25:03] (03CR) 10Alex Monk: [C: 032] Enable logging of XMP debug log channel for severity >=warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234425 (https://phabricator.wikimedia.org/T89532) (owner: 10BryanDavis) [15:25:26] can i still fit a patch into morning swat? [15:25:28] (03Merged) 10jenkins-bot: Enable logging of XMP debug log channel for severity >=warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234425 (https://phabricator.wikimedia.org/T89532) (owner: 10BryanDavis) [15:25:39] deployed something last night to 22, but meant to put it in 21 which is running enwiki and dewiki [15:25:41] 6operations, 10Traffic, 10Wikimedia-Apache-configuration, 5Patch-For-Review: wikiversity.org and wikinews.org redirects to /503.html - https://phabricator.wikimedia.org/T109226#1620915 (10BBlack) Recap since we kinda stopped working on this temporarily: Both @ori and I have played with various patch varian... [15:25:42] ebernhardson, yes [15:25:58] awsome, making patch [15:26:21] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234425/ (duration: 00m 12s) [15:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:27:15] aude, heh, "DEPLOY NOTE: sync InitialiseSettings.php first, then Wikibase.php" [15:27:19] xmp logging is working [15:27:31] I would hope people deploying config changes can recognise which order things need doing in from the changes [15:27:38] errr [15:27:42] let me check [15:27:50] i think the opposite [15:28:08] No it's InitialiseSettings first, the message is right [15:28:13] just that you felt it needed to be said [15:28:15] no, the message is correct [15:29:03] so what's wrong? [15:29:18] nothing wrong [15:29:51] * aude hopes people deploying know without saying, but sometimes they ask [15:30:46] aude, okay, so looking at the patch itself [15:30:50] k [15:30:54] it sounds like this will affect nlwiki and wikidatawiki [15:31:13] yeah, 1 page on nlwiki [15:31:27] and any pages on wikidata that use more than 500 are already broken, with lua errors [15:31:33] right [15:31:52] do the nlwiki guys know about the issue with that page? [15:32:38] or we can set a higher limit for everyone [15:32:44] like 500 for everyone [15:33:31] or we can set the nlwiki limit to the maximum currently used [15:33:55] or 400 for everyone, 500 for wikidata [15:33:59] * aude in favor of that [15:34:04] ok [15:34:34] (03Abandoned) 10BBlack: HTTPS: Break insecure POST with 403 [puppet] - 10https://gerrit.wikimedia.org/r/221974 (https://phabricator.wikimedia.org/T105794) (owner: 10BBlack) [15:35:29] (03PS3) 10Aude: Set entityAccessLimit for WikibaseClient wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237097 [15:35:29] * aude amends [15:36:27] aude, for the wmf22 wikidata change, I'm literally just going to sync-dir the extension in the normal way [15:36:30] is that right? [15:36:36] sync-dir is good [15:37:04] I don't know about some of the files there like composer.lock or vendor/* [15:37:54] the vendor one just changed the order or wikibase in list of installed thigns [15:38:27] suppose that can happen depending on version of composer i am using [15:38:50] !log krenair@tin Synchronized php-1.26wmf22/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/237091/ (duration: 00m 21s) [15:38:53] aude, please test [15:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:58] ok [15:39:02] Krenair: i totally spaced on you still doing the wikidata ones .... so my two patches are pulled to tin and rebased, but not shipped yet. lemme know when you are done and i will push the button [15:39:06] (03PS1) 10BBlack: Limit insecure POST to text-cluster only [puppet] - 10https://gerrit.wikimedia.org/r/237110 (https://phabricator.wikimedia.org/T105794) [15:39:19] 6operations, 7Database: Drop *_old database tables from Wikimedia wikis - https://phabricator.wikimedia.org/T54932#1620986 (10jcrespo) p:5Normal>3Low [15:39:25] looks good [15:39:31] 6operations, 7Database: Drop *_old database tables from Wikimedia wikis - https://phabricator.wikimedia.org/T54932#1620995 (10jcrespo) 5Open>3stalled [15:40:14] ebernhardson, ugh... [15:40:25] ebernhardson, what is your patch? [15:40:48] Krenair: same one as last night, but this time on the correct branch. it turns on the completion suggester experiement in wmf21 [15:41:45] where is this patch on tin? [15:41:58] Krenair: core and wikimedia events [15:42:01] i can just push it ... [15:42:22] okay, those are fine [15:42:46] please be careful about touching mediawiki-staging on tin during deployment windows, or when others are shown as logged into the machine [15:43:01] !log ebernhardson@tin Synchronized php-1.26wmf21/extensions/WikimediaEvents/: Enable suggester AB experiement (duration: 00m 11s) [15:43:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:43:37] !log ebernhardson@tin Synchronized php-1.26wmf21/resources/src/mediawiki/mediawiki.searchSuggest.js: Enable completion suggester AB experiment (duration: 00m 12s) [15:43:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:45:05] (03CR) 10Alex Monk: [C: 032] Set entityAccessLimit for WikibaseClient wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237097 (owner: 10Aude) [15:45:32] (03Merged) 10jenkins-bot: Set entityAccessLimit for WikibaseClient wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237097 (owner: 10Aude) [15:45:55] thanks [15:46:16] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237097/ (duration: 00m 12s) [15:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:46:31] 6operations, 10Traffic, 7HTTPS: HTTPS Plans (tracking / high-level info) - https://phabricator.wikimedia.org/T104681#1621042 (10BBlack) [15:46:42] !log krenair@tin Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/237097/ (duration: 00m 12s) [15:46:44] aude, please check ^ [15:46:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:47:06] looks ok [15:47:08] ebernhardson, oh and other thing - you were only supposed to be doing one patch [15:49:20] (03CR) 10Alex Monk: [C: 032] Add Wikijunior and Cookbook namespaces to wgContentNamespaces for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237104 (https://phabricator.wikimedia.org/T76663) (owner: 10Mjbmr) [15:49:35] 6operations, 10ops-codfw, 5Patch-For-Review: provision wmf5846 and wmf5848 - https://phabricator.wikimedia.org/T111697#1621053 (10fgiunchedi) @papaul what are the switch ports for these servers? also `wmf5845` [15:49:46] (03Merged) 10jenkins-bot: Add Wikijunior and Cookbook namespaces to wgContentNamespaces for fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237104 (https://phabricator.wikimedia.org/T76663) (owner: 10Mjbmr) [15:50:13] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237104/ (duration: 00m 12s) [15:50:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:53:57] (03PS3) 10Alex Monk: Tidy up more comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236701 (https://phabricator.wikimedia.org/T31902) [15:55:34] Krenair: sorry, they are the same patch just in different repos due to how the code is split... [15:55:51] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1621061 (10Elitre) >>! In T105756#1600617, @Dzahn wrote: >>>! In T105756#1600447, @Ariconte wrote: >> Are you going to tell the readers??? To forestall all the 'The... [15:57:04] (03CR) 10Alex Monk: [C: 032] Tidy up more comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236701 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [15:57:10] (03Merged) 10jenkins-bot: Tidy up more comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236701 (https://phabricator.wikimedia.org/T31902) (owner: 10Alex Monk) [15:57:29] bd808: in the CirrusSearch.log file every line is now appended with %exception%, is that your stuff? [15:57:41] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/236701/ - noop (duration: 00m 12s) [15:57:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:57:50] ebernhardson: yeah. it will go away with the new branch this week [15:58:05] 10Ops-Access-Requests, 6operations: Requesting access to stat1003, stat1002 and bast1001 for JMinor - https://phabricator.wikimedia.org/T111872#1621075 (10JKatzWMF) approved [15:58:10] bd808: ok, thats fine as long as i didn't break anything new :) [15:59:20] (03CR) 10CSteipp: [C: 031] Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [16:00:02] (03CR) 10Alex Monk: [C: 032] Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [16:00:05] andrewbogott: Dear anthropoid, the time has come. Please deploy Labs OpenStack upgrade (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T1600). [16:00:36] (03Merged) 10jenkins-bot: Disallow indexing for /api/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236200 (https://phabricator.wikimedia.org/T109023) (owner: 10GWicke) [16:01:05] !log krenair@tin Synchronized robots.txt: https://gerrit.wikimedia.org/r/#/c/236200/ (duration: 00m 12s) [16:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:04:27] (03CR) 10Filippo Giunchedi: [C: 031] "I'm going to merge this tomorrow if there are no objections" [puppet] - 10https://gerrit.wikimedia.org/r/236389 (https://phabricator.wikimedia.org/T108953) (owner: 10Eevans) [16:05:33] Krenair: are you done? [16:05:42] * aude doesn't want to forget about sync-dblists [16:05:53] aude, yep [16:05:57] ok [16:06:03] * aude can do that [16:06:17] if this works then we should document it on wikitech [16:06:38] !log aude@tin Synchronized database lists: Remove unused usagetracking.dblist (duration: 00m 12s) [16:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:06:51] done and hopefully that worked [16:07:13] * aude looks on mw1017 [16:07:48] looks good :) [16:08:12] 6operations, 10Traffic, 7HTTPS: Preload HSTS for select hostnames within wikimedia.org - https://phabricator.wikimedia.org/T111967#1621104 (10BBlack) 3NEW a:3BBlack [16:08:22] 6operations, 10Traffic, 7HTTPS: Preload HSTS for select hostnames within wikimedia.org - https://phabricator.wikimedia.org/T111967#1621104 (10BBlack) [16:08:23] 6operations, 10Traffic: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1374594 (10BBlack) [16:11:58] https://wikitech.wikimedia.org/w/index.php?title=How_to_deploy_code&type=revision&diff=176549&oldid=176444 [16:12:22] 6operations, 6Community-Advocacy, 10Traffic: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1621139 (10BBlack) [16:12:40] (03PS7) 10Andrew Bogott: Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 (https://phabricator.wikimedia.org/T110045) [16:12:55] (03PS2) 10Andrew Bogott: Switch labcontrol1001 to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236950 [16:14:48] (03CR) 10Andrew Bogott: [C: 032] Added openstack config files for version Kilo [puppet] - 10https://gerrit.wikimedia.org/r/235399 (https://phabricator.wikimedia.org/T110045) (owner: 10Andrew Bogott) [16:15:18] (03PS3) 10Andrew Bogott: Switch labcontrol1001 to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236950 [16:15:55] (03PS2) 10Andrew Bogott: Switched labnet hosts to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236951 [16:17:15] (03CR) 10Andrew Bogott: [C: 032] Switch labcontrol1001 to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236950 (owner: 10Andrew Bogott) [16:20:08] 6operations, 6Community-Advocacy, 10Traffic: Fix/decom multiple-subdomain wikis in wikimedia.org - https://phabricator.wikimedia.org/T102826#1621167 (10BBlack) `www.meta` and `www.commons` I think we can just remove unilaterally at this point - they've been effectively broken for HTTPS for a while now, they... [16:24:00] (03CR) 10MaxSem: [C: 031] mobile: remove "Temp test" CC header block for icons/gadget [puppet] - 10https://gerrit.wikimedia.org/r/237105 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [16:24:01] Krenair: ping? [16:24:24] PROBLEM - mailman_ctl on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [16:24:54] PROBLEM - mailman_qrunner on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [16:25:04] revi, pong [16:26:53] (03CR) 10MarcoAurelio: "Oops! For some reason I forgot that. Thanks Mjbmr, I'll submit an amendment for this. Regards." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) (owner: 10MarcoAurelio) [16:27:42] Since I see you were doing be-x-old stuff, do you think be-x-old in wmf-config/abusefilter.php fixed? [16:27:56] (operations/mediawiki-config) [16:28:19] no more playing tennis? [16:28:31] I prefer table tennis [16:28:52] (03PS2) 10Andrew Bogott: Switch labvirt1005 to openstack kilo [puppet] - 10https://gerrit.wikimedia.org/r/236952 [16:28:54] (03PS3) 10Andrew Bogott: Switched labnet hosts to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236951 [16:28:56] (03PS1) 10Andrew Bogott: Rename logdir to log_dir [puppet] - 10https://gerrit.wikimedia.org/r/237135 [16:30:10] (03CR) 10Andrew Bogott: [C: 032] Rename logdir to log_dir [puppet] - 10https://gerrit.wikimedia.org/r/237135 (owner: 10Andrew Bogott) [16:32:48] (03PS2) 10MarcoAurelio: Enabling 'flood' flag at scowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) [16:34:47] Krenair: see above (I'm going to bed so...) [16:35:53] revi, the line in abusefilter.php needs to remain the same [16:35:59] reason is that it uses the database name [16:36:05] which cannot be changed right now [16:36:33] (03PS3) 10MarcoAurelio: Enabling 'flood' flag at scowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) [16:37:20] (03CR) 10Andrew Bogott: [C: 032] Switched labnet hosts to Openstack Kilo [puppet] - 10https://gerrit.wikimedia.org/r/236951 (owner: 10Andrew Bogott) [16:47:45] !log systemctl stop nodepool on labnodepool1001 [16:47:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:57:56] RECOVERY - mailman_ctl on fermium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl [16:58:11] 6operations, 10ops-eqiad: ps1-a5 -eqiad power not balanced - https://phabricator.wikimedia.org/T111973#1621306 (10Cmjohnson) 3NEW a:3Cmjohnson [16:59:24] RECOVERY - mailman_qrunner on fermium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner [17:01:52] (03PS1) 10Robmoen: Fix typo with example survey schema config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237145 (https://phabricator.wikimedia.org/T111974) [17:05:05] (03PS3) 10Andrew Bogott: Switch labvirt1005 to openstack kilo [puppet] - 10https://gerrit.wikimedia.org/r/236952 [17:05:07] (03PS1) 10Andrew Bogott: Move rabbit config settings into [oslo_messaging_rabbit] [puppet] - 10https://gerrit.wikimedia.org/r/237146 [17:06:14] (03CR) 10Andrew Bogott: [C: 032] Move rabbit config settings into [oslo_messaging_rabbit] [puppet] - 10https://gerrit.wikimedia.org/r/237146 (owner: 10Andrew Bogott) [17:14:19] (03PS4) 10Andrew Bogott: Switch labvirt1005 to openstack kilo [puppet] - 10https://gerrit.wikimedia.org/r/236952 [17:14:21] (03PS1) 10Andrew Bogott: Change logging import for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237148 [17:14:32] (03PS1) 10MarcoAurelio: Modify user rights configuration at frwiki for changetags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237149 (https://phabricator.wikimedia.org/T98629) [17:14:47] jynus, got it, thanks. I also commented in support on the proposal talk page. For now, should I use both Blocked-by-operations and Blocked-by-schema-change if we're blocked? [17:15:26] (03CR) 10Andrew Bogott: [C: 032] Change logging import for kilo [puppet] - 10https://gerrit.wikimedia.org/r/237148 (owner: 10Andrew Bogott) [17:15:37] matt_flaschen, in meeting [17:15:46] (03PS2) 10MarcoAurelio: Modify user rights configuration at frwiki for changetags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237149 (https://phabricator.wikimedia.org/T98629) [17:18:13] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1621425 (10TheDJ) I've taken the liberty of sending a mail to the support department of grandpad.net [17:21:15] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1621429 (10akosiaris) Those machines are in 3 different rack rows indeed, so they might very well be good. ``` analytics1011 => A rack row analytics1015 => C rack row analytics1016 => C rack ro... [17:35:21] matt_flaschen, the blocked by schema change is only a proposal to make the workflow easier, I think some people disagree with its creation [17:35:41] jynus, okay, I'll used blocked-by-operations for now. [17:44:09] (03CR) 10Jdlrobson: [C: 04-1] Fix typo with example survey schema config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237145 (https://phabricator.wikimedia.org/T111974) (owner: 10Robmoen) [17:49:44] PROBLEM - nova-network process on labnet1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-network [17:51:44] RECOVERY - nova-network process on labnet1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-network [17:53:44] (03CR) 10Andrew Bogott: [C: 032] Switch labvirt1005 to openstack kilo [puppet] - 10https://gerrit.wikimedia.org/r/236952 (owner: 10Andrew Bogott) [17:55:55] PROBLEM - nova-compute process on labvirt1008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [17:56:35] PROBLEM - nova-compute process on labvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [17:59:32] mutante, JohnFLewis: How's that mailman migration going? [18:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T1800). Please do the needful. [18:01:56] PROBLEM - mailman_qrunner on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [18:02:45] PROBLEM - mailman_ctl on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [18:03:05] RECOVERY - nova-compute process on labvirt1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [18:04:24] RECOVERY - nova-compute process on labvirt1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [18:04:29] Krenair: unfortunately back to sodium for now [18:06:04] yeah... lucid still :( [18:06:07] mutante, damn. something went wrong? [18:06:20] (03PS3) 10Jcrespo: Add perf-admins group and add to relevant roles [puppet] - 10https://gerrit.wikimedia.org/r/236847 (https://phabricator.wikimedia.org/T110926) (owner: 10Ori.livneh) [18:07:26] (03CR) 10Jcrespo: [C: 032] Add perf-admins group and add to relevant roles [puppet] - 10https://gerrit.wikimedia.org/r/236847 (https://phabricator.wikimedia.org/T110926) (owner: 10Ori.livneh) [18:08:08] (03PS1) 10Dzahn: Revert "lists: hold mail to lists.wm.o" [puppet] - 10https://gerrit.wikimedia.org/r/237160 [18:08:15] (03PS1) 10John F. Lewis: Revert "lists: hold mail to lists.wm.o" [puppet] - 10https://gerrit.wikimedia.org/r/237161 [18:08:23] :-) [18:08:23] same idea :) [18:08:30] (03Abandoned) 10John F. Lewis: Revert "lists: hold mail to lists.wm.o" [puppet] - 10https://gerrit.wikimedia.org/r/237161 (owner: 10John F. Lewis) [18:09:13] (03PS2) 10Dzahn: Revert "lists: hold mail to lists.wm.o" [puppet] - 10https://gerrit.wikimedia.org/r/237160 [18:09:55] Krenair: yes, messed up things with the rsync/import/mv and another sync would have taken too long [18:10:12] deploying the train to group1 [18:10:42] (03PS1) 1020after4: group1 wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237163 [18:11:15] (03CR) 1020after4: [C: 032] group1 wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237163 (owner: 1020after4) [18:11:23] (03Merged) 10jenkins-bot: group1 wikis to 1.26wmf22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237163 (owner: 1020after4) [18:11:30] (03CR) 10Dzahn: [C: 032] Revert "lists: hold mail to lists.wm.o" [puppet] - 10https://gerrit.wikimedia.org/r/237160 (owner: 10Dzahn) [18:11:37] mutante, JohnFLewis: this change did absolutely nothing [18:11:46] both the commit and the revert [18:11:49] completely no-ops [18:12:12] hold_domains is just a name of a domainlist and it's not one that is being referenced by the rest of the config [18:13:32] 6operations, 6Services: [DRAFT] Services team roadmap October - December 2015 (Q2 2015/16) - https://phabricator.wikimedia.org/T111819#1621663 (10GWicke) [18:15:09] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf22 [18:15:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:16:10] (03PS3) 10Ottomata: Modify eventlogging graphite alerts so that they are based on kafka metrics [puppet] - 10https://gerrit.wikimedia.org/r/236947 (https://phabricator.wikimedia.org/T106254) [18:18:39] (03PS4) 10Ottomata: Modify eventlogging graphite alerts so that they are based on kafka metrics [puppet] - 10https://gerrit.wikimedia.org/r/236947 (https://phabricator.wikimedia.org/T106254) [18:24:10] (03Abandoned) 10Robmoen: Fix typo with example survey schema config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237145 (https://phabricator.wikimedia.org/T111974) (owner: 10Robmoen) [18:29:30] (03CR) 10Ottomata: [C: 032] Modify eventlogging graphite alerts so that they are based on kafka metrics [puppet] - 10https://gerrit.wikimedia.org/r/236947 (https://phabricator.wikimedia.org/T106254) (owner: 10Ottomata) [18:32:58] (03PS1) 10Andrew Bogott: Maintain support for juno-style compute RPC [puppet] - 10https://gerrit.wikimedia.org/r/237174 [18:34:15] (03PS2) 10Andrew Bogott: Maintain support for juno-style compute RPC [puppet] - 10https://gerrit.wikimedia.org/r/237174 [18:35:17] (03CR) 10Andrew Bogott: [V: 032] Maintain support for juno-style compute RPC [puppet] - 10https://gerrit.wikimedia.org/r/237174 (owner: 10Andrew Bogott) [18:35:33] (03CR) 10Andrew Bogott: [C: 032] Maintain support for juno-style compute RPC [puppet] - 10https://gerrit.wikimedia.org/r/237174 (owner: 10Andrew Bogott) [18:41:59] (03PS1) 10Andrew Bogott: Drop conductor api version down. [puppet] - 10https://gerrit.wikimedia.org/r/237175 [18:42:14] 6operations, 7Database: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#1621742 (10jcrespo) @faidon Yes, replication over SSL has negligible cost due to usually maintaining 1 single open connection all the time. Probably the only time blocker would be potentially having to restar... [18:43:02] (03CR) 10Andrew Bogott: [C: 032] Drop conductor api version down. [puppet] - 10https://gerrit.wikimedia.org/r/237175 (owner: 10Andrew Bogott) [18:44:35] 6operations, 10hardware-requests: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1621746 (10JAllemandou) Right, we'll need to update our puppet accordingly @milimetric :) [18:44:43] 6operations, 10ops-codfw, 5Patch-For-Review: provision wmf5846 and wmf5848 - https://phabricator.wikimedia.org/T111697#1621747 (10Papaul) resbase-test2001 ge-5/0/19 resbase-test2002 ge-5/0/18 resbase-test2003 ge-5/0/16 [18:46:06] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1621752 (10Dzahn) [18:46:06] 6operations, 10Wikimedia-Mailing-lists: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1621751 (10Dzahn) 5Resolved>3Open [18:46:11] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1621753 (10Dzahn) 5Resolved>3Open [18:46:12] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1450894 (10Dzahn) [18:46:18] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1450894 (10Dzahn) [18:46:19] 6operations, 10Wikimedia-Mailing-lists: shut down mailman on sodium - https://phabricator.wikimedia.org/T110137#1621755 (10Dzahn) 5Resolved>3Open [18:46:52] 6operations: icinga (neon) is out of CPU headroom - https://phabricator.wikimedia.org/T110822#1621757 (10jcrespo) We already discussed this on Ops meeting, the immediate solution would be in hardware, or maybe splitting it into 2 servers. [18:47:22] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [5000000.0] [18:47:28] 6operations: icinga (neon) is out of CPU headroom - https://phabricator.wikimedia.org/T110822#1621758 (10jcrespo) [18:47:29] 6operations, 7Icinga: improve icinga performance - https://phabricator.wikimedia.org/T85222#1621759 (10jcrespo) [18:47:44] hm [18:47:57] 6operations, 7Icinga: improve icinga performance - https://phabricator.wikimedia.org/T85222#941720 (10jcrespo) Merging because I think both proposals are about the same thing. [18:49:14] 6operations, 6Services: [DRAFT] Services team roadmap October - December 2015 (Q2 2015/16) - https://phabricator.wikimedia.org/T111819#1621762 (10GWicke) [18:49:23] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 1.00% above the threshold [1000000.0] [18:49:37] 6operations, 7Icinga: improve icinga performance / solve general load issues - https://phabricator.wikimedia.org/T85222#1621766 (10jcrespo) [18:49:48] hm, strange. [18:50:03] 6operations, 7Icinga: improve icinga performance / solve general load issues on neon - https://phabricator.wikimedia.org/T85222#941720 (10jcrespo) [18:51:05] (03PS1) 10Andrew Bogott: Well, this feature is poorly documented. [puppet] - 10https://gerrit.wikimedia.org/r/237176 [18:53:17] (03CR) 10Andrew Bogott: [C: 032] Well, this feature is poorly documented. [puppet] - 10https://gerrit.wikimedia.org/r/237176 (owner: 10Andrew Bogott) [18:54:16] I think the script for auto-creating tasks for access requests is broken, but I do not know how [18:59:50] (03PS1) 10Andrew Bogott: set compat levels back to 'Juno' [puppet] - 10https://gerrit.wikimedia.org/r/237179 [19:00:54] (03CR) 10Andrew Bogott: [C: 032] set compat levels back to 'Juno' [puppet] - 10https://gerrit.wikimedia.org/r/237179 (owner: 10Andrew Bogott) [19:01:48] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1621806 (10Dzahn) Unfortunately the migration didn't work out this time. We made some last minute changes to the rsync/import scripts to use mv instead of rsync to ma... [19:16:46] (03PS1) 10Andrew Bogott: Last ditch attempt to make use of upgrade levels. [puppet] - 10https://gerrit.wikimedia.org/r/237181 [19:18:05] (03CR) 10Andrew Bogott: [C: 032] Last ditch attempt to make use of upgrade levels. [puppet] - 10https://gerrit.wikimedia.org/r/237181 (owner: 10Andrew Bogott) [19:23:01] jynus, which tasks have you seen it broken for? [19:23:33] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1621887 (10Cyberpower678) Glad to see my bots are no longer listed. [19:25:22] (03PS1) 10Andrew Bogott: Remove [upgrade_levels] section. [puppet] - 10https://gerrit.wikimedia.org/r/237184 [19:25:38] Krenair, sorry, I lack of context [19:26:06] I think the script for auto-creating tasks for access requests is broken, but I do not know how [19:26:26] (03CR) 10Andrew Bogott: [C: 032] Remove [upgrade_levels] section. [puppet] - 10https://gerrit.wikimedia.org/r/237184 (owner: 10Andrew Bogott) [19:26:30] a couple of things [19:27:05] https://phabricator.wikimedia.org/T111955 [19:27:49] also, if I manually create an access review, it has strange results [19:28:13] its on my activity log, but to be fair, it is quite full today [19:28:49] do you know where those scripts are, so maybe I can understand what's wrong? [19:29:43] jynus: my understanding is it only works if the security level is set to access request and probably only works when the task is actually created, not later (may be wrong) [19:35:46] (03PS2) 10BBlack: mobile: remove "Temp test" CC header block for icons/gadget [puppet] - 10https://gerrit.wikimedia.org/r/237105 (https://phabricator.wikimedia.org/T109286) [19:36:27] but what I want is not to be created automatically, not the other way round [19:36:56] (03CR) 10BBlack: [C: 032] mobile: remove "Temp test" CC header block for icons/gadget [puppet] - 10https://gerrit.wikimedia.org/r/237105 (https://phabricator.wikimedia.org/T109286) (owner: 10BBlack) [19:45:35] (03PS1) 10Ottomata: Remove eventlogging ZMQ based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/237188 [19:46:02] (03PS2) 10Ottomata: Remove eventlogging ZMQ based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/237188 (https://phabricator.wikimedia.org/T106254) [19:46:16] (03PS1) 10BBlack: remove www.commons, www.meta, pa.us [dns] - 10https://gerrit.wikimedia.org/r/237189 (https://phabricator.wikimedia.org/T102826) [19:46:18] (03PS1) 10BBlack: remove www.nl and noboard.chapters [dns] - 10https://gerrit.wikimedia.org/r/237190 (https://phabricator.wikimedia.org/T102826) [19:46:26] (03PS1) 10BBlack: Remove redirects for www.(meta|commons), pa.us [puppet] - 10https://gerrit.wikimedia.org/r/237191 (https://phabricator.wikimedia.org/T102826) [19:46:28] (03PS1) 10BBlack: Remove redirects for www.nl, noboard.chapters [puppet] - 10https://gerrit.wikimedia.org/r/237192 (https://phabricator.wikimedia.org/T102826) [19:48:26] 6operations, 7Database: Physical location SPOF because of database server distribution on a single rack (D1) - https://phabricator.wikimedia.org/T111992#1622166 (10jcrespo) [19:49:05] 6operations, 7Database: Spikes of job runner new connection errors to mysql "Error connecting to 10.64.32.24: Can't connect to MySQL server on '10.64.32.24' (4)" - https://phabricator.wikimedia.org/T107072#1622173 (10jcrespo) T111992 And this confirms my suspicious about the traffic. [19:57:15] (03CR) 10Ottomata: [C: 032] Remove eventlogging ZMQ based monitoring [puppet] - 10https://gerrit.wikimedia.org/r/237188 (https://phabricator.wikimedia.org/T106254) (owner: 10Ottomata) [20:00:04] gwicke cscott arlolra subbu mdholloway: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T2000). Please do the needful. [20:02:39] deploying parsoid now [20:09:14] will this be fixing https://phabricator.wikimedia.org/T111818, subbu? [20:09:35] Krenair, yes. see deployment log https://www.mediawiki.org/wiki/Parsoid/Deployments [20:10:35] 6operations, 10Traffic, 7HTTPS: investigate/remove hostname login.m.wikimedia.org - https://phabricator.wikimedia.org/T111998#1622283 (10BBlack) 3NEW a:3BBlack [20:13:49] (03PS1) 10Yuvipanda: k8s: Allow insecure API access only to localhost [puppet] - 10https://gerrit.wikimedia.org/r/237246 [20:13:56] (03CR) 10jenkins-bot: [V: 04-1] k8s: Allow insecure API access only to localhost [puppet] - 10https://gerrit.wikimedia.org/r/237246 (owner: 10Yuvipanda) [20:14:45] (03PS2) 10Yuvipanda: k8s: Allow insecure API access only to localhost [puppet] - 10https://gerrit.wikimedia.org/r/237246 [20:15:21] (03CR) 10Yuvipanda: [C: 032 V: 032] k8s: Allow insecure API access only to localhost [puppet] - 10https://gerrit.wikimedia.org/r/237246 (owner: 10Yuvipanda) [20:17:54] !log deployed parsoid version ffd0b444 [20:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:20:57] 6operations, 10ops-eqiad, 10netops: cr1-eqiad PEM 2 failure - https://phabricator.wikimedia.org/T112000#1622323 (10faidon) 3NEW a:3Cmjohnson [20:21:27] 6operations, 10ops-eqiad, 10netops: cr2-eqiad PEM 2 failure - https://phabricator.wikimedia.org/T112000#1622334 (10faidon) [20:22:22] (03PS1) 10Ottomata: EventLogging client side processor now consumes from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/237249 (https://phabricator.wikimedia.org/T106260) [20:23:11] (03CR) 10jenkins-bot: [V: 04-1] EventLogging client side processor now consumes from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/237249 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:23:47] (03PS2) 10Ottomata: EventLogging client side processor now consumes from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/237249 (https://phabricator.wikimedia.org/T106260) [20:26:30] (03PS3) 10Ottomata: EventLogging client side processor now consumes from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/237249 (https://phabricator.wikimedia.org/T106260) [20:28:34] (03CR) 10Ottomata: [C: 032] EventLogging client side processor now consumes from Kafka [puppet] - 10https://gerrit.wikimedia.org/r/237249 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:29:02] 6operations, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 6Performance-Team: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622378 (10Krinkle) 3NEW [20:29:42] 6operations, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 6Performance-Team: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622394 (10Krinkle) p:5Triage>3High [20:30:30] 6operations, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 6Performance-Team: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622378 (10Krinkle) [20:37:43] 10Ops-Access-Requests, 6operations: Requesting access to stat1003, stat1002 and bast1001 for JMinor - https://phabricator.wikimedia.org/T111872#1622430 (10JMinor) Nice to meet you too Jaime. I have read and signed the access agreement. Let me know if you need anything else from me. Thanks! [20:40:45] (03PS1) 10Ottomata: Leave eventlogging server side raw forwarder outputting to zmq until we are ready to turn off all zmq [puppet] - 10https://gerrit.wikimedia.org/r/237255 (https://phabricator.wikimedia.org/T106260) [20:41:56] (03CR) 10Ottomata: [C: 032] Leave eventlogging server side raw forwarder outputting to zmq until we are ready to turn off all zmq [puppet] - 10https://gerrit.wikimedia.org/r/237255 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:42:22] PROBLEM - Check status of defined EventLogging jobs on eventlog1001 is CRITICAL: CRITICAL: Stopped EventLogging jobs: processor/client-side-0 [20:42:45] deploying a thing, that's fine! [20:43:40] (03PS1) 10Filippo Giunchedi: install_server: fix restbase-test2* addresses [puppet] - 10https://gerrit.wikimedia.org/r/237257 [20:43:53] (03PS2) 10Filippo Giunchedi: install_server: fix restbase-test2* addresses [puppet] - 10https://gerrit.wikimedia.org/r/237257 [20:44:05] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: fix restbase-test2* addresses [puppet] - 10https://gerrit.wikimedia.org/r/237257 (owner: 10Filippo Giunchedi) [20:44:32] RECOVERY - Check status of defined EventLogging jobs on eventlog1001 is OK: OK: All defined EventLogging jobs are runnning. [20:45:19] 7Blocked-on-Operations, 6operations, 6Phabricator, 10Traffic: Phabricator needs to expose ssh and notification daemon (websocket) - https://phabricator.wikimedia.org/T100519#1622458 (10mmodell) [20:54:00] (03PS1) 10Ottomata: Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) [20:54:51] (03CR) 10jenkins-bot: [V: 04-1] Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:56:14] (03PS2) 10Ottomata: Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) [20:57:14] (03CR) 10Madhuvishy: [C: 031] Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [20:59:48] (03PS3) 10Ottomata: Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) [21:01:04] (03CR) 10Ottomata: [C: 032] Make eventlogging files consumer consume from Kafka instead of ZMQ [puppet] - 10https://gerrit.wikimedia.org/r/237261 (https://phabricator.wikimedia.org/T106260) (owner: 10Ottomata) [21:03:01] (03PS1) 10Ottomata: Not using hiera for kafka consumer args for file consumers [puppet] - 10https://gerrit.wikimedia.org/r/237262 [21:03:21] (03CR) 10Ottomata: [C: 032 V: 032] Not using hiera for kafka consumer args for file consumers [puppet] - 10https://gerrit.wikimedia.org/r/237262 (owner: 10Ottomata) [21:05:55] (03PS1) 10Ottomata: Use different default consumer group name for files consumers [puppet] - 10https://gerrit.wikimedia.org/r/237263 [21:06:13] (03CR) 10Ottomata: [C: 032 V: 032] Use different default consumer group name for files consumers [puppet] - 10https://gerrit.wikimedia.org/r/237263 (owner: 10Ottomata) [21:08:31] 6operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-Requests: Rename Võro Wikipedia, fiu-vro -> vro - https://phabricator.wikimedia.org/T31186#1622563 (10Vorok) Rename fiu-vro to vro is still needed and there is consensus that it should be done. [21:13:08] (03PS1) 10Ottomata: Set raw=True for raw file event consumers [puppet] - 10https://gerrit.wikimedia.org/r/237267 [21:13:30] (03CR) 10Ottomata: [C: 032 V: 032] Set raw=True for raw file event consumers [puppet] - 10https://gerrit.wikimedia.org/r/237267 (owner: 10Ottomata) [21:14:26] (03PS5) 10Milimetric: [WIP] Add an Analytics specific instance of RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) [21:14:28] (03CR) 10Milimetric: [WIP] Add an Analytics specific instance of RESTBase (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [21:15:31] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1622598 (10grandpad) This is the team from grandpad.net - @TheDJ thank you for reaching out to us. We've corrected the problem (changed http to https) and we're rolling out the change to al... [21:16:01] (03CR) 10Milimetric: "Thanks for the comments, Marko, I addressed them inline." [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [21:27:37] 6operations, 10Traffic, 7HTTPS, 5Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1622632 (10BBlack) >>! In T105794#1621425, @TheDJ wrote: > I've taken the liberty of sending a mail to the support department of grandpad.net >>! In T105794#1622598, @grandpad wrote: > Th... [21:31:31] (03CR) 10Mobrovac: [C: 04-1] "One minor thing wrt the pageviews sub-spec. Otherwise, LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/231574 (https://phabricator.wikimedia.org/T107056) (owner: 10Milimetric) [21:33:16] 6operations, 10Traffic: Upgrade codfw,ulsfo,esams LVS to jessie - https://phabricator.wikimedia.org/T96375#1622639 (10jcrespo) [21:35:49] 6operations, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 6Performance-Team: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622642 (10BBlack) Varnish is the primary issue here. Raising shm_reclen is non-trivial, especially to much-larger values. W... [21:35:52] bd808 yt? [21:40:20] 6operations, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 6Performance-Team: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002#1622651 (10ori) >>! In T112002#1622642, @BBlack wrote: > I think we can work out how to raise it to 2048 safely pretty easily.... [21:52:35] (03PS1) 10Dzahn: mailman: don't run service on migration host [puppet] - 10https://gerrit.wikimedia.org/r/237275 [21:52:37] (03PS1) 10Dzahn: mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 [21:52:39] (03PS1) 10Dzahn: fermium: back to migration role [puppet] - 10https://gerrit.wikimedia.org/r/237277 [21:58:00] (03PS2) 10Dzahn: mailman: don't run service on migration host [puppet] - 10https://gerrit.wikimedia.org/r/237275 [21:59:04] (03CR) 10Dzahn: [C: 032] mailman: don't run service on migration host [puppet] - 10https://gerrit.wikimedia.org/r/237275 (owner: 10Dzahn) [21:59:38] (03PS2) 10Dzahn: mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 [22:00:13] (03PS3) 10Dzahn: mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 (https://phabricator.wikimedia.org/T105756) [22:01:52] (03PS4) 10Dzahn: mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 (https://phabricator.wikimedia.org/T105756) [22:02:54] (03PS5) 10Dzahn: mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 (https://phabricator.wikimedia.org/T105756) [22:03:12] (03CR) 10Dzahn: [C: 032] mailman: no more importing to an import dir [puppet] - 10https://gerrit.wikimedia.org/r/237276 (https://phabricator.wikimedia.org/T105756) (owner: 10Dzahn) [22:04:54] (03PS2) 10Dzahn: fermium: back to migration role [puppet] - 10https://gerrit.wikimedia.org/r/237277 [22:06:33] (03CR) 10Dzahn: [C: 032] fermium: back to migration role [puppet] - 10https://gerrit.wikimedia.org/r/237277 (owner: 10Dzahn) [22:16:35] (03PS8) 10Ori.livneh: Send image varnish frontend data from logs to statsd [puppet] - 10https://gerrit.wikimedia.org/r/234157 (https://phabricator.wikimedia.org/T105681) (owner: 10Gilles) [22:22:10] (03CR) 10Ori.livneh: Send image varnish frontend data from logs to statsd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/234157 (https://phabricator.wikimedia.org/T105681) (owner: 10Gilles) [22:26:07] (03PS1) 10Dzahn: mailman: fix duplicate declaration rsync module [puppet] - 10https://gerrit.wikimedia.org/r/237281 [22:26:14] (03CR) 10jenkins-bot: [V: 04-1] mailman: fix duplicate declaration rsync module [puppet] - 10https://gerrit.wikimedia.org/r/237281 (owner: 10Dzahn) [22:26:24] (03PS2) 10Dzahn: mailman: fix duplicate declaration rsync module [puppet] - 10https://gerrit.wikimedia.org/r/237281 [22:27:08] (03CR) 10Dzahn: [C: 032] mailman: fix duplicate declaration rsync module [puppet] - 10https://gerrit.wikimedia.org/r/237281 (owner: 10Dzahn) [22:27:47] (03PS3) 10Smalyshev: Create real URIs for wikidata RDF URIs [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) [22:29:54] (03PS4) 10Smalyshev: Create real URIs for wikidata RDF URIs [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) [22:34:43] (03PS1) 10EBernhardson: Update CirrusSearch config to define clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237282 [22:34:55] (03PS2) 10EBernhardson: Update CirrusSearch config to define clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237282 [22:35:37] (03CR) 10EBernhardson: [C: 04-1] "Depends on Ie6a559249b863 being deployed to all machines first." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237282 (owner: 10EBernhardson) [22:36:07] (03PS3) 10EBernhardson: Update CirrusSearch config to define clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237282 [22:36:38] (03PS4) 10EBernhardson: Update CirrusSearch config to define clusters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237282 (https://phabricator.wikimedia.org/T109734) [22:36:53] (03PS2) 10BBlack: Limit insecure POST to text-cluster only [puppet] - 10https://gerrit.wikimedia.org/r/237110 (https://phabricator.wikimedia.org/T105794) [22:37:09] (03CR) 10BBlack: [C: 032 V: 032] "Validated in the compiler" [puppet] - 10https://gerrit.wikimedia.org/r/237110 (https://phabricator.wikimedia.org/T105794) (owner: 10BBlack) [22:38:00] (03CR) 10BBlack: [C: 032] remove www.commons, www.meta, pa.us [dns] - 10https://gerrit.wikimedia.org/r/237189 (https://phabricator.wikimedia.org/T102826) (owner: 10BBlack) [22:42:51] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Puppet has 1 failures [22:43:41] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Puppet has 1 failures [22:43:59] doh [22:44:09] too bad the compiler can't parse VCL, too :P [22:45:31] (03PS1) 10BBlack: bugfix for 4026b5c3b [puppet] - 10https://gerrit.wikimedia.org/r/237286 [22:45:43] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 3 below the confidence bounds [22:45:51] (03CR) 10BBlack: [C: 032 V: 032] bugfix for 4026b5c3b [puppet] - 10https://gerrit.wikimedia.org/r/237286 (owner: 10BBlack) [22:46:32] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet has 1 failures [22:46:52] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Puppet has 1 failures [22:46:52] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Puppet has 1 failures [22:46:52] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Puppet has 1 failures [22:47:19] (03PS2) 10BBlack: Remove redirects for www.(meta|commons), pa.us [puppet] - 10https://gerrit.wikimedia.org/r/237191 (https://phabricator.wikimedia.org/T102826) [22:47:22] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:47:22] PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: Puppet has 1 failures [22:47:32] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: Puppet has 1 failures [22:47:41] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:11] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:12] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:12] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:13] PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:19] (03CR) 10BBlack: [C: 032] Remove redirects for www.(meta|commons), pa.us [puppet] - 10https://gerrit.wikimedia.org/r/237191 (https://phabricator.wikimedia.org/T102826) (owner: 10BBlack) [22:48:23] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:32] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:32] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:52] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Puppet has 1 failures [22:49:47] (03PS1) 10Dzahn: mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) [22:49:54] (03CR) 10jenkins-bot: [V: 04-1] mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) (owner: 10Dzahn) [22:50:47] (03PS2) 10Dzahn: mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) [22:50:52] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:50:53] (03CR) 10jenkins-bot: [V: 04-1] mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) (owner: 10Dzahn) [22:51:00] (03PS3) 10Dzahn: mailman: change the way we rsync stuff [puppet] - 10https://gerrit.wikimedia.org/r/237287 (https://phabricator.wikimedia.org/T108071) [22:51:22] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:22] PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:31] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:41] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: Puppet has 1 failures [22:51:49] puppet bukakke [22:52:11] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:12] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:12] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:13] PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: Puppet has 1 failures [22:52:23] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: Puppet has 1 failures [22:53:51] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [22:55:39] (03PS1) 10Andrew Bogott: Holmium to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237289 [22:58:03] (03CR) 10Andrew Bogott: [C: 032] Holmium to kilo [puppet] - 10https://gerrit.wikimedia.org/r/237289 (owner: 10Andrew Bogott) [23:00:04] RoanKattouw ostriches rmoen Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150909T2300). Please do the needful. [23:00:05] matt_flaschen Krenair: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:17] Present [23:00:22] Can I interject another patch? [23:00:33] Or maybe I can just run the SWAT myself? [23:01:31] You could run SWAT yourself, especially considering you're on the list :p [23:01:51] there's only 4 patches there, so yeah, should be fine to add another [23:04:26] Krenair: Do I need to do anything special after updating dumpInterwiki.php ? [23:04:33] Like rerun it or something? Or run scap? [23:04:53] updateinterwikicache [23:05:12] OK thanks [23:08:32] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [23:08:52] RECOVERY - puppet last run on cp2020 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [23:09:04] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 5 below the confidence bounds [23:09:22] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [23:09:23] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:09:35] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [23:10:22] RECOVERY - puppet last run on cp1074 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [23:10:33] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:11:03] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:11:14] RECOVERY - puppet last run on cp2021 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [23:11:16] Anyone know of issues on the servers? Having dificulty logging into hooft at the moment ( and terbium but my terbium proxy goes through hooft so that's not a surprise) [23:11:54] RECOVERY - puppet last run on cp1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:12:02] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:12:02] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [23:12:27] !log catrope@tin Synchronized php-1.26wmf21/extensions/Flow: SWAT (duration: 00m 29s) [23:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:12:42] !log catrope@tin Synchronized php-1.26wmf21/extensions/WikimediaMaintenance: SWAT (duration: 00m 14s) [23:12:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:13:15] !log catrope@tin Synchronized php-1.26wmf22/extensions/Flow: SWAT (duration: 00m 32s) [23:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:13:29] !log catrope@tin Synchronized php-1.26wmf22/extensions/WikimediaMaintenance: SWAT (duration: 00m 13s) [23:13:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:14:33] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:15:33] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [23:16:44] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [23:17:07] 6operations, 10Math, 5Patch-For-Review: Install texlive-extra-utils on mw appservers - https://phabricator.wikimedia.org/T109195#1623061 (10Dzahn) what does texlive-extra-utils contain? --> ``` This package includes the following CTAN packages: a2ping -- Advanced PS, PDF, EPS converter. adhocfilelist -... [23:19:03] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 13 data above and 9 below the confidence bounds [23:20:59] RoanKattouw, did you run the updateinterwikicache? [23:21:05] or are you deploying something else? [23:22:39] Oh sorry I got distracted [23:22:40] Will do that now [23:22:53] !log Running updateinterwikicache [23:22:59] !log catrope@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 11s) [23:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [23:23:03] !log deployed Kartotherian config updates [23:23:07] Done [23:23:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:45] 6operations, 10Math, 5Patch-For-Review: Install texlive-extra-utils on mw appservers - https://phabricator.wikimedia.org/T109195#1623089 (10Dzahn) >>! In T109195#1543321, @Reedy wrote: > Do they just need whitelisting in the Math extension maybe then? :/ How do you do that? I cloned the Math extension but i... [23:25:03] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 14 data above and 9 below the confidence bounds [23:25:57] RoanKattouw, uh... oh dear [23:26:01] okay, that didn't quite work as expected [23:26:11] still deploying or can I log in and change things? [23:27:31] ...that was a great time for the office wifi to randomly kick me off [23:27:35] heh [23:28:17] Krenair: Did you say anything after [16:26] Krenair still deploying or can I log in and change things? [23:28:18] Krenair: I'm not doing anything right now, go bonkers [23:28:25] no [23:28:25] ok [23:29:45] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s) [23:29:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:31:35] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s) [23:34:09] Krenair: So what was wrong? [23:34:18] !log krenair@tin Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s) [23:34:23] be-x-old and be-tarask interwiki links aren't working as I was expecting [23:34:34] I thought we needed to instead alias them the other way around (rather than removing them as the commit did) [23:34:37] but that didn't have any effect [23:35:35] instead I've set the cluster back to using the previous one and will fiddle around only on mw1017 [23:38:23] urgh, but there is no mwscript on mw1017, dammit [23:38:39] that works anyway [23:40:24] I think because it uses www-data there [23:40:32] but 'apache' is hardcoded in mwscript [23:41:58] will have to get that fixed somehow later [23:42:38] 7Blocked-on-Operations, 7Puppet, 6Reading-Infrastructure-Team, 10Sentry, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1623177 (10greg) [23:42:43] 6operations, 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1623179 (10greg) [23:51:13] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 17 data above and 9 below the confidence bounds [23:58:51] 16:59 < wikibugs> Wikibugs: wikibugs - throttle output, don't get kicked for flooding - https://phabricator.wikimedia.org/T112032#1623218 (Dzahn) NEW [23:58:54] 16:59 -!- wikibugs [tools.wiki@wikimedia/bot/pywikibugs] has quit [Excess Flood] [23:58:57] :) [23:59:08] mutante: :-) [23:59:13] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 17 data above and 9 below the confidence bounds