[00:04:33] 10Operations, 10Continuous-Integration-Infrastructure: legoktm can't deploy docker images on contint1001 - https://phabricator.wikimedia.org/T186475#3945046 (10Legoktm) p:05Triage>03High [02:30:13] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.17) (duration: 06m 03s) [02:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:43] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 855.35 seconds [04:04:52] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 174.83 seconds [06:56:22] PROBLEM - Long running screen/tmux on bast4002 is CRITICAL: CRIT: Long running SCREEN process. (PID: 15761, 1729844s 1728000s). [07:43:06] !log install libjson-c2-dbg on phab1001 to allow better debugging of httpd/mod-php stuck process - T182832 [07:43:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:43:20] T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 [07:45:06] !log Deploy schema change on s8 primary master (db1071) - T174569 [07:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:18] hey [07:45:18] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [07:45:22] PROBLEM - Long running screen/tmux on bast2001 is CRITICAL: CRIT: Long running SCREEN process. (PID: 32511, 1732814s 1728000s). [07:46:11] who can help with bot attack on otrs queue, 10mail/sec keeps piling up from single ip? [07:48:11] already 1100 emails [07:50:47] 10Operations, 10ops-codfw, 10DBA: db2039 disk in predictive failure - https://phabricator.wikimedia.org/T186479#3945149 (10Marostegui) [07:51:39] 10Operations, 10ops-codfw, 10DBA: db2039 disk in predictive failure - https://phabricator.wikimedia.org/T186479#3945161 (10Marostegui) p:05Triage>03Normal [07:57:14] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408237 (https://phabricator.wikimedia.org/T162807) [08:08:54] Hi HakanIST, atm there are me and marostegui that are checking OTRS logs but we have not a ton of experience with it so it might take a bit [08:09:17] HakanIST: you have that IP? [08:09:19] elukey: thanks, it seems to have stopped now [08:09:33] yep exactly, the IP would be great [08:09:36] marostegui: 203.192.188.106 [08:10:02] I think someone else got to it, thanks guys [08:10:55] super :) [08:11:05] oh, that is good :) [08:13:46] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408237 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [08:15:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408237 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [08:17:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408237 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [08:17:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 - T162807 (duration: 00m 56s) [08:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:09] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [08:23:40] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408238 (https://phabricator.wikimedia.org/T186321) [08:26:27] (03PS1) 10Marostegui: db1078: Change it to statement, update socket [puppet] - 10https://gerrit.wikimedia.org/r/408239 (https://phabricator.wikimedia.org/T186321) [08:26:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408238 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [08:27:24] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3945200 (10elukey) So for some reason each httpd proces... [08:28:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408238 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [08:28:42] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408238 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [08:30:34] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1078 - T186321 (duration: 00m 55s) [08:30:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:47] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [08:43:51] (03CR) 10Hashar: [C: 032] "Included upstream via https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=887353" [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404283 (owner: 10Hashar) [08:44:51] !log Stop MySQL on db1078, upgrade mariadb, kernel and socket location - T186321 [08:45:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:03] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [08:47:01] (03CR) 10Marostegui: [C: 032] db1078: Change it to statement, update socket [puppet] - 10https://gerrit.wikimedia.org/r/408239 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [08:51:01] (03PS3) 10Hashar: Rebuild for stretch-wikimedia [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) [08:52:34] (03CR) 10jerkins-bot: [V: 04-1] Rebuild for stretch-wikimedia [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [09:19:06] (03PS1) 10Marostegui: db-eqiad.php: Repool db1078 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408240 [09:22:21] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1078 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408240 (owner: 10Marostegui) [09:22:58] (03PS5) 10Gehel: wdqs: remove cleanup code after migrating to prometheus jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/405888 (https://phabricator.wikimedia.org/T182773) [09:26:04] (03PS1) 10Hashar: support process a lintian output file [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/408243 [09:26:06] (03PS1) 10Hashar: 0.17.0+wmf1: support process a lintian output file [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/408244 [09:27:13] (03Abandoned) 10Hashar: support process a lintian output file [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/408243 (owner: 10Hashar) [09:27:15] (03CR) 10Gehel: [C: 032] wdqs: remove cleanup code after migrating to prometheus jmx exporter [puppet] - 10https://gerrit.wikimedia.org/r/405888 (https://phabricator.wikimedia.org/T182773) (owner: 10Gehel) [09:27:20] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1078 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408240 (owner: 10Marostegui) [09:27:22] (03Abandoned) 10Hashar: 0.17.0+wmf1: support process a lintian output file [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/408244 (owner: 10Hashar) [09:28:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1078 with low traffic - T186321 (duration: 00m 53s) [09:28:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:49] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [09:30:14] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408245 [09:31:26] (03PS1) 10Hashar: support process a lintian output file [debs/jenkins-debian-glue] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408246 [09:35:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408245 (owner: 10Marostegui) [09:37:03] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408245 (owner: 10Marostegui) [09:38:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1078 traffic - T186321 (duration: 00m 55s) [09:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:32] T186321: Prepare and indicate proper master db failover candidates for all database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321 [09:41:12] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1078 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408240 (owner: 10Marostegui) [09:41:13] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408245 (owner: 10Marostegui) [09:43:17] (03PS1) 10Hashar: 0.17.0-wmf1: support process a lintian output file [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408247 [09:44:04] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408248 [09:49:22] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Puppet has 14 failures. Last run 3 minutes ago with 14 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[eventlogging/analytics],Exec[chown /srv/deployment/eventlogging for eventlogging],Exec[git_pull_mediawiki/event-schemas] [09:49:37] (03PS2) 10Hashar: support process a lintian output file [debs/jenkins-debian-glue] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408246 (https://phabricator.wikimedia.org/T186494) [09:50:50] (03PS2) 10Hashar: 0.17.0-wmf1: support process a lintian output file [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408247 (https://phabricator.wikimedia.org/T186494) [09:52:13] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408248 (owner: 10Marostegui) [09:54:24] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408248 (owner: 10Marostegui) [09:55:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1078 traffic (duration: 00m 55s) [09:55:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:08] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408248 (owner: 10Marostegui) [10:08:18] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408250 [10:10:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408250 (owner: 10Marostegui) [10:12:04] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408250 (owner: 10Marostegui) [10:12:16] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408250 (owner: 10Marostegui) [10:13:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1078 (duration: 00m 55s) [10:13:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:08] (03PS1) 10Elukey: Force JAVA_HOME to openjdk-8's jre for all Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/408251 (https://phabricator.wikimedia.org/T166248) [10:19:22] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:25:51] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/9858/" [puppet] - 10https://gerrit.wikimedia.org/r/408251 (https://phabricator.wikimedia.org/T166248) (owner: 10Elukey) [10:47:21] 10Operations, 10MediaWiki-Platform-Team, 10HHVM, 10NewPHP, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3945544 (10ArielGlenn) >>! In T176370#3934147, @Imarlier wrote: > @MoritzMuehlenhoff Is someone actively working on dumps? I haven't seen movement on http... [10:48:04] 10Operations, 10Continuous-Integration-Infrastructure: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#3945550 (10hashar) And eventually I found out today that our 0.17.0 package include a few more patches from upstream: | * 09a78f2 - (ge... [10:51:23] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Puppet has 14 failures. Last run 5 minutes ago with 14 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[eventlogging/analytics],Exec[chown /srv/deployment/eventlogging for eventlogging],Exec[git_pull_mediawiki/event-schemas] [10:52:30] elukey: this seems to be flapping ^^^ [10:52:58] Could not evaluate: Cannot allocate memory - fork(2) [10:53:01] lovely [10:53:14] (03PS3) 10Hashar: support process a lintian output file [debs/jenkins-debian-glue] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408246 (https://phabricator.wikimedia.org/T186494) [10:53:34] (03PS3) 10Hashar: 0.18.4-wmf1: support process a lintian output file [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408247 (https://phabricator.wikimedia.org/T186494) [10:53:40] nice [10:55:42] there is one el process that is taking a lot of memory, not sure what it is doing [10:59:36] elukey: need a hand? [11:00:04] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:00:30] nono it seems something that has been slowly leaking in time, so I am going to stop/start the daemon that is causing trouble, and then report back to my team [11:03:03] !log restart eventlogging/forwarder legacy-zmq on eventlog1001 due to slow memory leak over time (cached memory down to zero) [11:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:29] hi apergos ! around? :) [11:03:38] (03CR) 10Hashar: "recheck" [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408247 (https://phabricator.wikimedia.org/T186494) (owner: 10Hashar) [11:03:42] (I just saw you comment on a ticket 5 mins ago) :D [11:04:29] !log Upgraded jenkins-debian-glue to 0.18.4-wmf1 | T186494 [11:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:41] T186494: jenkins-debian-glue should run the lintian version from cowbuilder instead of from host - https://phabricator.wikimedia.org/T186494 [11:04:56] volans: seems better now https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?orgId=1&panelId=4&fullscreen&var-server=eventlog1001&var-datasource=eqiad%20prometheus%2Fops&from=1517827153282&to=1517828620444 :D [11:07:29] (03CR) 10Hashar: [C: 04-2] "recheck" [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [11:07:58] elukey:ehehe yeah, the 30d graph is "interesting" ;) [11:08:58] (03CR) 10jerkins-bot: [V: 04-1] Rebuild for stretch-wikimedia [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [11:12:01] addshore: yes indeed [11:12:23] what's up, other than that I added you not-quite-gratuitously to the mw third parties ticket? :-P [11:12:39] hehe, I was already subscribed ;) [11:12:52] whew! [11:12:58] I poked you briefly about getting my 2FA on phab reset during the week of the dev summit / all hands [11:13:05] oh yeah, and then I disappeared [11:13:07] I never ended up getting that done and was wondering if we could do it now :D [11:13:33] Otherwise, one of these days phab will log out on my laptop and I won't be able to write any more comments of add important tokens to tickets! [11:13:49] indeed we can do it, where's the task again? [11:14:06] I haven't filed a task, not really sure what the process would be! [11:14:10] can do if needed [11:14:27] (03PS1) 10Giuseppe Lavagetto: openldap::management: refactoring to profile [puppet] - 10https://gerrit.wikimedia.org/r/408256 [11:14:55] let's do it so we're on the record [11:15:15] <_joe_> you're already on the record, but yeah I think a ticket is preferred [11:15:16] (03PS1) 10Marostegui: db-eqiad.php: db1078 is the candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408257 (https://phabricator.wikimedia.org/T186321) [11:15:25] <_joe_> so that we're sure you're him :) [11:15:26] 10Operations, 10Phabricator: Reset 2FA for Addshore on Phabricator - https://phabricator.wikimedia.org/T186508#3945688 (10Addshore) [11:15:30] ^^ [11:16:22] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [11:16:35] addshore@tin:~$ cat phab [11:16:35] please reset my 2fa on phab [11:16:42] 10Operations, 10Phabricator: Reset 2FA for Addshore on Phabricator - https://phabricator.wikimedia.org/T186508#3945704 (10MarcoAurelio) p:05Triage>03Normal [11:18:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: db1078 is the candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408257 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [11:18:45] 10Operations, 10Phabricator, 10User-Addshore: Reset 2FA for Addshore on Phabricator - https://phabricator.wikimedia.org/T186508#3945717 (10Addshore) [11:18:59] (03PS3) 10Filippo Giunchedi: hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) [11:19:28] Don't know how I managed to copy all my other 2fa stuff over but just miss phab [11:19:38] (03Merged) 10jenkins-bot: db-eqiad.php: db1078 is the candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408257 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [11:19:48] (03CR) 10jenkins-bot: db-eqiad.php: db1078 is the candidate master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408257 (https://phabricator.wikimedia.org/T186321) (owner: 10Marostegui) [11:20:54] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: extend SMART eqiad deployment [puppet] - 10https://gerrit.wikimedia.org/r/403621 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [11:21:06] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Clarify db1078 comment as it is the new candidate master for s3 (duration: 00m 55s) [11:21:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:05] 10Operations, 10Phabricator, 10User-Addshore: Reset 2FA for Addshore on Phabricator - https://phabricator.wikimedia.org/T186508#3945688 (10ArielGlenn) Identity verified by checking a file created by addshore on tin. Reet done. [11:22:37] addshore: let me know please that you are able to get in [11:23:12] re added 2fa, gonna log out and in now [11:23:53] apergos: works :) [11:24:02] ok, closing the ticket, enjoy! [11:26:33] volans: created https://phabricator.wikimedia.org/T186510 :) [11:27:04] :D [11:29:42] PROBLEM - puppet last run on labstore1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:29:58] labstore is possibly me, checking [11:30:12] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:30:24] !log expand smart metrics checking rollout with https://gerrit.wikimedia.org/r/#/c/403621/ [11:30:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:10] it is indeed me, double declaration for python3 [11:35:03] (03PS1) 10Filippo Giunchedi: labstore: move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/408258 [11:36:13] fix ^ [11:36:44] anyone holding a rubberstamp? [11:37:32] PROBLEM - puppet last run on labstore1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:38:02] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/408258 (owner: 10Filippo Giunchedi) [11:38:07] godog: ^ [11:38:31] thanks volans ! [11:38:39] (03CR) 10Filippo Giunchedi: [C: 032] labstore: move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/408258 (owner: 10Filippo Giunchedi) [11:38:51] I like new gerrit's emails [11:44:33] RECOVERY - puppet last run on labstore1004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:46:10] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3945748 (10Paladox) @elukey these were the changes betw... [11:47:47] (03PS6) 10Paladox: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) [11:55:12] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:59:02] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3945765 (10Aklapper) >>! In T185988#3941132, @Esteban wrote: > About NDA, It seems that I do not have access to the one I should sign : https://phabricator.wikimedia.or... [12:06:46] (03PS1) 10Hashar: Hook to run lintian [debs/jenkins-debian-glue] (patch-queue/debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408260 [12:07:00] (03PS1) 10Hashar: 0.18.4-wmf2: add hook B90lintian [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408261 (https://phabricator.wikimedia.org/T186494) [12:07:33] RECOVERY - puppet last run on labstore1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:10:22] (03CR) 10MarcoAurelio: "I've left some comments on the task. This apparently cannot be sync'd until you remove all pages from the Topic namespace first/convert th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408073 (https://phabricator.wikimedia.org/T186463) (owner: 10Zoranzoki21) [12:17:53] !log Drop old and renamed wikidata tables from s5 master (db1070) - T184599 [12:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:07] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [12:20:37] !log Drop empty wikidata database from s5 master (db1070) - T184599 [12:20:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:24] (03CR) 10Hashar: [C: 04-2] "recheck" [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [12:30:44] (03CR) 10jerkins-bot: [V: 04-1] Rebuild for stretch-wikimedia [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [12:33:02] (03PS2) 10Giuseppe Lavagetto: openldap::management: refactoring to profile [puppet] - 10https://gerrit.wikimedia.org/r/408256 [12:45:47] (03CR) 10Giuseppe Lavagetto: [C: 032] openldap::management: refactoring to profile [puppet] - 10https://gerrit.wikimedia.org/r/408256 (owner: 10Giuseppe Lavagetto) [12:49:42] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [12:52:42] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:52:59] (03PS3) 10Arturo Borrero Gonzalez: apt: disable daily cron job from apt-show-versions [puppet] - 10https://gerrit.wikimedia.org/r/407456 (https://phabricator.wikimedia.org/T186230) [12:53:31] someone could +X that so I feel confident for merging? ^^^ [12:54:56] (tiny change, actually) [12:57:03] <_joe_> arturo: looking [12:57:45] thanks [13:00:48] (03CR) 10Giuseppe Lavagetto: [C: 031] "LGTM, that cronjob is indeed useless and annoying. Small style note just as FYI" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/407456 (https://phabricator.wikimedia.org/T186230) (owner: 10Arturo Borrero Gonzalez) [13:02:27] (03PS4) 10Arturo Borrero Gonzalez: apt: disable daily cron job from apt-show-versions [puppet] - 10https://gerrit.wikimedia.org/r/407456 (https://phabricator.wikimedia.org/T186230) [13:03:49] (03CR) 10Arturo Borrero Gonzalez: [C: 032] apt: disable daily cron job from apt-show-versions [puppet] - 10https://gerrit.wikimedia.org/r/407456 (https://phabricator.wikimedia.org/T186230) (owner: 10Arturo Borrero Gonzalez) [13:12:32] (03PS1) 10KartikMistry: apertium-ukr: Initial Debian packaging [debs/contenttranslation/apertium-ukr] - 10https://gerrit.wikimedia.org/r/408264 (https://phabricator.wikimedia.org/T184901) [13:14:56] (03PS2) 10KartikMistry: apertium-ukr: Initial Debian packaging [debs/contenttranslation/apertium-ukr] - 10https://gerrit.wikimedia.org/r/408264 (https://phabricator.wikimedia.org/T184901) [13:15:41] (03PS1) 10Hashar: package_builder: ability to override HOOKDIR [puppet] - 10https://gerrit.wikimedia.org/r/408265 (https://phabricator.wikimedia.org/T186494) [13:17:04] kart_: akosiaris I have broke the debian glue job sorry :( [13:17:30] hashar: no worries :) [13:20:09] !log Rename dewiki tables on s8 master (db1071 - with no replication) before dropping them - T184599 [13:20:09] 10Operations, 10Developer-Relations, 10Discourse: Bring discourse.mediawiki.org to production - https://phabricator.wikimedia.org/T180853#3945887 (10Aklapper) >>! In T180853#3944015, @Tgr wrote: > Probably should get Bitergia integration by the time of production deployment. Created T186513 [13:20:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:20] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [13:20:24] 10Operations, 10Developer-Relations, 10Discourse: Bring discourse.mediawiki.org to production - https://phabricator.wikimedia.org/T180853#3945890 (10Aklapper) [13:21:56] jouncebot: next [13:21:56] In 0 hour(s) and 38 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1400) [13:23:53] jouncebot reload [13:24:29] (03Abandoned) 10Hashar: package_builder: ability to override HOOKDIR [puppet] - 10https://gerrit.wikimedia.org/r/408265 (https://phabricator.wikimedia.org/T186494) (owner: 10Hashar) [13:39:04] jouncebot: next [13:39:05] In 0 hour(s) and 20 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1400) [13:39:54] (03CR) 10Hashar: "recheck" [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408261 (https://phabricator.wikimedia.org/T186494) (owner: 10Hashar) [13:41:12] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 2 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3911827 (10Prtksxna) >>! In T185282#3914281, @Dzahn wrote: > @Volker_E Gotcha! So you are requesting web space to clone to from Gi... [13:45:14] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3945931 (10elukey) >>! In T182832#3945748, @Paladox wro... [13:45:31] (03PS3) 10Rxy: Add 'rollbacker' group at arwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406591 (https://phabricator.wikimedia.org/T185720) [13:47:04] (03CR) 10Hashar: [C: 04-2] "recheck" [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [13:48:03] (03CR) 10jerkins-bot: [V: 04-1] Rebuild for stretch-wikimedia [debs/pkg-php/php-ast] - 10https://gerrit.wikimedia.org/r/404284 (https://phabricator.wikimedia.org/T174338) (owner: 10Hashar) [13:49:39] (03CR) 10Hashar: "recheck" [debs/jenkins-debian-glue] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/408261 (https://phabricator.wikimedia.org/T186494) (owner: 10Hashar) [13:58:26] (03CR) 10Rush: [C: 04-1] "root@tools-bastion-05:~# python apt-upgrade list" [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [14:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy European Mid-day SWAT(Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1400). [14:00:04] rxy, MatmaRex, stephanebisson, Zoranzoki21, and Zppix: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:17] hello [14:00:20] hi [14:00:22] hi [14:00:49] rxy: and our dream finaly is here. Our patches will be deployed :) [14:00:56] *finally [14:01:09] yeah [14:03:13] o/ [14:03:26] sorry I am a little late [14:03:40] hashar: No problem. You will be our dear deployer? [14:03:56] (03CR) 10Hashar: [C: 032] Add 'rollbacker' group at arwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406591 (https://phabricator.wikimedia.org/T185720) (owner: 10Rxy) [14:05:38] (03Merged) 10jenkins-bot: Add 'rollbacker' group at arwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406591 (https://phabricator.wikimedia.org/T185720) (owner: 10Rxy) [14:05:51] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3945944 (10Paladox) Your gdb traceback leads to the new... [14:07:09] im here [14:07:13] sorry im late [14:07:14] (03CR) 10jenkins-bot: Add 'rollbacker' group at arwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406591 (https://phabricator.wikimedia.org/T185720) (owner: 10Rxy) [14:07:28] rxy: Add 'rollbacker' group at arwikibooks is now on mwdebug1001 [14:07:43] back shortly, reboot [14:08:01] (03PS3) 10Hashar: Add throttle rule for an event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407183 (https://phabricator.wikimedia.org/T185930) (owner: 10Zppix) [14:08:07] (03CR) 10Hashar: [C: 032] Add throttle rule for an event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407183 (https://phabricator.wikimedia.org/T185930) (owner: 10Zppix) [14:09:31] weirds [14:09:37] (03Merged) 10jenkins-bot: Add throttle rule for an event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407183 (https://phabricator.wikimedia.org/T185930) (owner: 10Zppix) [14:10:03] Too, can you deploy this change in mediawiki/core this patch https://gerrit.wikimedia.org/r/#/c/368336/ because everyday gerrit sending us email to change is rebased [14:10:25] rxy: "rollbackers" group seems to show up on https://ar.wikibooks.org/wiki/%D8%AE%D8%A7%D8%B5:%D8%B9%D8%B1%D8%B6_%D8%B5%D9%84%D8%A7%D8%AD%D9%8A%D8%A7%D8%AA_%D8%A7%D9%84%D9%85%D8%AC%D9%85%D9%88%D8%B9%D8%A7%D8%AA :) [14:10:43] (03CR) 10jenkins-bot: Add throttle rule for an event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407183 (https://phabricator.wikimedia.org/T185930) (owner: 10Zppix) [14:10:47] sysops should be add or remove that group. [14:11:45] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Add 'rollbacker' group at arwikibooks - T185720 (duration: 00m 56s) [14:11:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:00] T185720: Creation of Rollbacker group on ar.wikibooks - https://phabricator.wikimedia.org/T185720 [14:12:15] rxy: and I don't know how to check that :( [14:12:46] rxy: how that is supposedly listed on the page back. Guess something is missing in the configuration :( [14:13:37] https://ar.wikibooks.org/wiki/خاص:عرض_صلاحيات_المجموعات?uselang=en#sysop "Add group: IP block exemptions" , "Remove group: IP block exemptions" [14:13:45] I found problem [14:13:49] hashar: I found problem [14:14:03] hashar: I will provide fix of patch [14:14:11] \o/ [14:14:37] Patch coming for two minutes [14:14:52] Zppix: I have deployed your throttling change [14:14:54] !log hashar@tin Synchronized wmf-config/throttle.php: Add throttle rule for an event - T185930 (duration: 00m 55s) [14:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:06] T185930: Lift IP cap for account creation for Art+Feminism at Triangle Arts Association – March 10th - https://phabricator.wikimedia.org/T185930 [14:15:54] stephanebisson: around for Flow "OptInController catch both errors and exception" https://gerrit.wikimedia.org/r/#/c/406631/ ? [14:16:08] hashar: o/ thanks [14:16:15] hashar: yes [14:16:33] (03Draft2) 10Zoranzoki21: Fix https://gerrit.wikimedia.org/r/#/c/406591/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408270 [14:16:54] hashar: Fix of patch from rxy is here https://gerrit.wikimedia.org/r/#/c/408270/ [14:17:03] ah thanks! [14:17:39] (03CR) 10Rxy: [C: 031] "LGTM, thanks for fixing my mistake" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408270 (owner: 10Zoranzoki21) [14:17:45] stephanebisson: it is on mwdebug1001 now if there is a way to test it [14:18:03] stephanebisson: though I am not sure how you can trigger an error to make sure the throwable is caught :( [14:18:04] hashar: I'll test it right away [14:18:08] (03CR) 10Arturo Borrero Gonzalez: "This is a python3-only script:" [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [14:18:23] hashar: Please deploy https://gerrit.wikimedia.org/r/#/c/408270/ It is fix of problem with patch of user rxy [14:18:34] hashar: I think I can summon the error in this case [14:18:37] (03CR) 10Hashar: [C: 032] "SWAT. I should have caught it really." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408270 (owner: 10Zoranzoki21) [14:19:18] Zoranzoki21: also can you rebase https://gerrit.wikimedia.org/r/#/c/406400/ ? :) [14:20:04] (03Merged) 10jenkins-bot: Fix https://gerrit.wikimedia.org/r/#/c/406591/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408270 (owner: 10Zoranzoki21) [14:20:07] hashar: No, I can not [14:20:15] (03CR) 10jenkins-bot: Fix https://gerrit.wikimedia.org/r/#/c/406591/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408270 (owner: 10Zoranzoki21) [14:20:46] Now? [14:22:25] hashar: Not the result I was hoping for but it doesn't hurt anything and I need to do more test with it. Please go ahead and deploy. [14:22:30] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Fix typo in arwikibooks rollbacker group - T185720 (duration: 00m 56s) [14:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:42] T185720: Creation of Rollbacker group on ar.wikibooks - https://phabricator.wikimedia.org/T185720 [14:22:45] stephanebisson: okk [14:23:01] rxy: now? [14:23:12] thanks hashar and Zoranzoki21 , It works for me [14:23:28] @server:mw1242.eqiad.wmnet [14:24:24] !log hashar@tin Synchronized php-1.31.0-wmf.17/extensions/Flow/includes/Import/OptInController.php: OptInController catch both errors and exception - T184670 (duration: 00m 55s) [14:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:36] T184670: [wmf.16-regression] Fatal exception of type "Flow\Exception\InvalidDataException" for opting out from "Structured Discussions on user talk" - https://phabricator.wikimedia.org/T184670 [14:25:35] (03CR) 10Herron: [C: 032] install_server: add dhcp and netboot entries for puppetdb VMs [puppet] - 10https://gerrit.wikimedia.org/r/407852 (https://phabricator.wikimedia.org/T185499) (owner: 10Herron) [14:25:40] (03PS2) 10Herron: install_server: add dhcp and netboot entries for puppetdb VMs [puppet] - 10https://gerrit.wikimedia.org/r/407852 (https://phabricator.wikimedia.org/T185499) [14:25:53] hashar: Which you want my first? https://gerrit.wikimedia.org/r/#/c/407017/ or https://gerrit.wikimedia.org/r/#/c/406476/ [14:26:03] (03PS5) 10Hashar: Add new throttle rules, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406400 (https://phabricator.wikimedia.org/T185794) (owner: 10Urbanecm) [14:26:20] hashar: Ok, first throttle. :) It is much easier [14:26:29] Zoranzoki21: I will pick / rebase the others :) [14:26:42] hashar: I can [14:26:53] (03PS6) 10Zoranzoki21: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406476 (https://phabricator.wikimedia.org/T185660) [14:27:01] (03PS7) 10Hashar: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406476 (https://phabricator.wikimedia.org/T185660) (owner: 10Zoranzoki21) [14:27:04] (03PS4) 10Zoranzoki21: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407017 (https://phabricator.wikimedia.org/T186107) [14:27:33] hashar: oops. We in same time did it [14:27:44] (03PS5) 10Hashar: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407017 (https://phabricator.wikimedia.org/T186107) (owner: 10Zoranzoki21) [14:28:17] (03CR) 10Hashar: [C: 032] Add new throttle rules, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406400 (https://phabricator.wikimedia.org/T185794) (owner: 10Urbanecm) [14:28:36] stephanebisson: your Flow patch is deployed [14:28:45] hashar: merci! [14:29:42] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406476 (https://phabricator.wikimedia.org/T185660) (owner: 10Zoranzoki21) [14:29:44] (03Merged) 10jenkins-bot: Add new throttle rules, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406400 (https://phabricator.wikimedia.org/T185794) (owner: 10Urbanecm) [14:29:58] (03CR) 10jenkins-bot: Add new throttle rules, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406400 (https://phabricator.wikimedia.org/T185794) (owner: 10Urbanecm) [14:31:57] !log hashar@tin Synchronized wmf-config/throttle.php: Add throttle rules - T185794 T185811 (duration: 00m 55s) [14:32:07] (03Merged) 10jenkins-bot: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406476 (https://phabricator.wikimedia.org/T185660) (owner: 10Zoranzoki21) [14:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:09] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407017 (https://phabricator.wikimedia.org/T186107) (owner: 10Zoranzoki21) [14:32:09] T185811: Lift IP cap for account creation for editathon at MCA Chicago Feb 8th - https://phabricator.wikimedia.org/T185811 [14:32:09] T185794: Lift IP cap for account creation for Art+Feminism at Kickstarter March 4th - https://phabricator.wikimedia.org/T185794 [14:33:23] (03CR) 10jenkins-bot: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406476 (https://phabricator.wikimedia.org/T185660) (owner: 10Zoranzoki21) [14:33:47] (03Merged) 10jenkins-bot: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407017 (https://phabricator.wikimedia.org/T186107) (owner: 10Zoranzoki21) [14:33:57] (03PS1) 10Herron: install_server: fix missing semicolon in dhcpd config [puppet] - 10https://gerrit.wikimedia.org/r/408272 [14:34:00] (03CR) 10jenkins-bot: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407017 (https://phabricator.wikimedia.org/T186107) (owner: 10Zoranzoki21) [14:34:36] (03CR) 10Herron: [C: 032] install_server: fix missing semicolon in dhcpd config [puppet] - 10https://gerrit.wikimedia.org/r/408272 (owner: 10Herron) [14:35:29] Zoranzoki21: will then do Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) [14:35:35] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex - T185660 (duration: 00m 55s) [14:35:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:47] T185660: Set wgNamespaceRobotPolicies on ptwiki's NS_USER to noindex - https://phabricator.wikimedia.org/T185660 [14:35:51] hashar: To I test? [14:35:57] Zoranzoki21: it is on mwdebug1001 now [14:36:36] hashar: Testing [14:37:30] hashar: work without problems [14:38:10] hmm [14:38:28] hashar: what? [14:38:30] https://et.wikipedia.org/wiki/Eri:AboutTopic/Q14384 has a title of Triceratops but doesn't show anything [14:38:38] there is just a redlink "Mall:AboutTopic" [14:39:18] They have not template for it. [14:39:21] Other work ok [14:39:39] I changed interface there on English and Mall is name of template namespace [14:40:29] ok :) [14:40:47] (03PS1) 10Gehel: New upstream release 1.0.2 [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 [14:41:11] guys I think something during SWAT has broke a global rename in progress :) [14:41:22] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) - T186107 (duration: 00m 55s) [14:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:34] T186107: Enable ArticlePlaceholder for Estonian Wikipedia (etwiki) - https://phabricator.wikimedia.org/T186107 [14:41:44] (03CR) 10Gehel: "Note that upstream v1.0.2 was merged in the previous commit." [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 (owner: 10Gehel) [14:42:36] lol [14:42:39] what? [14:46:24] !log European SWAT completed. I have not deployed matmarex patches to change Abkhaz collation ( https://gerrit.wikimedia.org/r/#/c/406185/ https://gerrit.wikimedia.org/r/#/c/406187/ ) [14:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:48] 10Operations, 10LDAP-Access-Requests: NDA request for Samtar - https://phabricator.wikimedia.org/T186344#3946088 (10herron) p:05Triage>03Normal [14:48:38] stephanebisson: seems Flow has been throwing a catchable fatal error https://logstash.wikimedia.org/goto/069074b70465816b9bdc9e9ceea58d2a [14:48:48] stephanebisson: maybe that is not a throwable ? [14:53:26] (03PS1) 10Filippo Giunchedi: prometheus: calculate nginx/varnish availability over five minutes [puppet] - 10https://gerrit.wikimedia.org/r/408274 (https://phabricator.wikimedia.org/T177195) [14:54:24] hashar: you finished with SWAT? [14:54:40] (03PS2) 10Filippo Giunchedi: prometheus: calculate nginx/varnish availability over five minutes [puppet] - 10https://gerrit.wikimedia.org/r/408274 (https://phabricator.wikimedia.org/T177195) [14:55:17] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: calculate nginx/varnish availability over five minutes [puppet] - 10https://gerrit.wikimedia.org/r/408274 (https://phabricator.wikimedia.org/T177195) (owner: 10Filippo Giunchedi) [14:55:53] (03PS2) 10Gehel: New upstream release 1.0.2 [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 [15:03:51] (03CR) 10Rush: [C: 04-1] "doh :) py3 of course..." [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [15:03:56] (03PS5) 10Rush: apt: merge report-pending-upgrades script into apt-upgrade [puppet] - 10https://gerrit.wikimedia.org/r/407465 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [15:04:23] 10Operations, 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown: Beta English Wikipedia: History of the page 'Bird' generates a 500 or 503 error - https://phabricator.wikimedia.org/T185969#3946191 (10Aklapper) This seems to be the same as T186186 [15:06:52] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Increase storage space for Wikidata Query Service - https://phabricator.wikimedia.org/T186526#3946220 (10Gehel) [15:09:29] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Increase storage space for Wikidata Query Service - https://phabricator.wikimedia.org/T186526#3946239 (10Gehel) The warranties expirations of the current systems: * wdqs1003: 2019-12-04 * wdqs1004... [15:12:15] hashar: I have a flow dump failure from earlier but have not yet investigated; if it still fails on the rerun I will dig into it some [15:17:38] (03CR) 10Ottomata: [C: 031] profile::analytics::refinery::job::camus: add netflow hourly job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/406951 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [15:18:03] (03CR) 10Ottomata: [C: 031] "Oh, sorry, you already patched to 1 task! :)" [puppet] - 10https://gerrit.wikimedia.org/r/406951 (https://phabricator.wikimedia.org/T181036) (owner: 10Elukey) [15:19:44] (03PS3) 10Gehel: New upstream release 1.0.2 [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 [15:23:00] (03CR) 10Filippo Giunchedi: [C: 031] New upstream release 1.0.2 [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 (owner: 10Gehel) [15:25:20] _joe_: just curious, how far off having any things (even tiny things) running in 'prod' on kubernetes? [15:25:36] <_joe_> addshore: this quarter, supposedly :) [15:25:45] ooooh, what will be first?> [15:26:18] !log temporary setting CoreDumpDirectory /srv/apache2_dump to httpd on phab1001 (+ httpd reload) to investigate core dumps for T182832 [15:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:31] T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 [15:26:45] please let me know any issue with phab, I am playing with some apache settings for some segfaults :( [15:27:15] I was just thinking, I have some pretty trivial small non critical things that it could be tested out with (for example scripts making a web request and then sending data to graphite) [15:27:40] (03PS3) 10Paladox: Gerrit: Remove certificate params [puppet] - 10https://gerrit.wikimedia.org/r/407932 [15:27:47] <_joe_> addshore: mathoid has been picked as a first service to migrate [15:28:02] (03PS3) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/407927 [15:28:35] <_joe_> addshore: uhm interesting, you're thinking of scheduling jobs on k8s, which is definitely something that could be done, but we didn't consider it as a use case until now [15:28:59] _joe_: aaah :) Yes, they would be scheduled things. [15:29:54] <_joe_> addshore: yeah for now we were thinking of services, but open a ticket with a proposal if you have one [15:30:11] (03PS15) 10Paladox: ircecho: Support ssl when connecting to irc [puppet] - 10https://gerrit.wikimedia.org/r/405591 [15:30:16] _joe_: okay, ill write a postit note for now and try to write a ticket soon [15:30:17] elukey: tried to edit a task and failed [15:30:41] <_joe_> volans: are you saying phab is not working? [15:30:59] _joe_: also, in theory should it be pretty easy to apply the k8s puppet roles / modules on labs pretty cleanly? or should I expect that have to do lots of poking? [15:31:18] volans: I just restarted it, might have been temp [15:31:41] elukey: yep worked now [15:31:43] it works for me now [15:31:44] okok [15:32:04] I think that CoreDumpDirectory needs a restart to get it working [15:32:13] most likely [15:32:17] now I am waiting for a segfault [15:33:50] (03CR) 10Gehel: [V: 032 C: 032] New upstream release 1.0.2 [debs/prometheus-elasticsearch-exporter] - 10https://gerrit.wikimedia.org/r/408273 (owner: 10Gehel) [15:34:19] (03PS1) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408281 (https://phabricator.wikimedia.org/T186530) [15:35:29] (03PS1) 10Urbanecm: Typo, it's 2018 not 2017 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408283 [15:35:29] no coredump created after segfault [15:36:12] <_joe_> ahahah [15:36:21] <_joe_> I didn't want to spoil it to you [15:36:39] <_joe_> but I don't think coredumps are created by segfaults in mod_php [15:36:51] <_joe_> in apache I mean [15:37:18] <_joe_> and I think that's a red herring for your current investigation, but I might be wrong [15:38:40] ah yes it might be totally separate but it segfaults every minute so I thought to verify :D [15:40:18] (03CR) 10Zoranzoki21: [C: 031] New throttle rule (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408281 (https://phabricator.wikimedia.org/T186530) (owner: 10Urbanecm) [15:42:16] of course there is a Max core file size [15:42:22] in /proc/pid/limits [15:42:36] soft limit 0, max unlimited [15:44:45] elukey it seems php 5.6 is only supported for security releases now so i doin't think they will fix any of the segfaults in 5.6 http://php.net/supported-versions.php [15:47:55] (03CR) 10Zoranzoki21: [C: 031] Typo, it's 2018 not 2017 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408283 (owner: 10Urbanecm) [15:48:08] (03CR) 10ArielGlenn: "I left a few replies to comments, and I'll get working on fixups." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/394977 (owner: 10ArielGlenn) [15:48:41] !log mholloway-shell@tin Started deploy [mobileapps/deploy@c9c774e]: Update mobileapps to 1411ccb [15:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:07] (03PS1) 10Herron: add puppetdb[12]001 to site.pp with role spare::system [puppet] - 10https://gerrit.wikimedia.org/r/408287 (https://phabricator.wikimedia.org/T185499) [15:50:12] (03CR) 10Herron: [C: 032] add puppetdb[12]001 to site.pp with role spare::system [puppet] - 10https://gerrit.wikimedia.org/r/408287 (https://phabricator.wikimedia.org/T185499) (owner: 10Herron) [15:50:28] paladox: let's see what it is first :) [15:50:28] (03PS2) 10Herron: add puppetdb[12]001 to site.pp with role spare::system [puppet] - 10https://gerrit.wikimedia.org/r/408287 (https://phabricator.wikimedia.org/T185499) [15:50:35] :) [15:54:47] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@c9c774e]: Update mobileapps to 1411ccb (duration: 06m 06s) [15:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:45] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3946456 (10Nuria) Sorry but we do not grant access to data unless you have a formal collaboration with research team, i believe the team is not formalizing any more col... [15:59:52] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3946457 (10Nuria) 05stalled>03declined [16:03:37] !log Renaming wikidata tables on s5 on codfw - T184599 [16:03:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:48] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [16:09:07] (03PS1) 10Giuseppe Lavagetto: Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 [16:10:30] !log Renaming wikidata tables on s5 on eqiad - T184599 [16:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:40] (03CR) 10jerkins-bot: [V: 04-1] Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (owner: 10Giuseppe Lavagetto) [16:10:40] T184599: s5 wikidatawiki database cleanup - https://phabricator.wikimedia.org/T184599 [16:15:16] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3946492 (10Halfak) Works for me. Thanks :) [16:17:09] 10Operations, 10Traffic, 10Accessibility, 10Browser-Support-Internet-Explorer: Wikipedia no longer accessible to those using some braille devices - https://phabricator.wikimedia.org/T185582#3946497 (10TheDJ) 05Open>03Invalid I'm hereby closing this ticket as invalid (works as intended), as the communic... [16:22:02] PROBLEM - Host mc2036.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [16:22:43] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Increase storage space for Wikidata Query Service - https://phabricator.wikimedia.org/T186526#3946504 (10RobH) I'd advise we purchase the SSDs from the system manufacturers, so they would be covere... [16:23:14] ACKNOWLEDGEMENT - HP RAID on db2039 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:8 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T186533 [16:23:17] 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3946506 (10ops-monitoring-bot) [16:23:24] it finally failed [16:23:50] lol happy for a failed disk [16:23:50] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3946510 (10Marostegui) p:05Triage>03Normal [16:24:27] 10Operations, 10ops-codfw, 10DBA: db2039 disk in predictive failure - https://phabricator.wikimedia.org/T186479#3946517 (10Marostegui) [16:24:30] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3946506 (10Marostegui) [16:28:10] 10Operations, 10Puppet, 10Patch-For-Review: Build a pair of debian stretch PuppetDB servers - https://phabricator.wikimedia.org/T185499#3946527 (10herron) VMs puppetdb[12]001 have been built in the same locations, and with the same specs, as nitrogen/nihal (as these VMs will be replacing them). Using role... [16:30:16] 10Operations, 10ops-codfw, 10DBA: db2039 disk in predictive failure - https://phabricator.wikimedia.org/T186479#3946531 (10Papaul) a:05Papaul>03Marostegui Disk replacement complete. [16:31:44] 10Operations, 10ops-codfw, 10DBA: db2039 disk in predictive failure - https://phabricator.wikimedia.org/T186479#3946535 (10Marostegui) Thanks @Papaul - that was fast! Will close once it is completed ``` logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete) physicaldrive 1I:1:1 (port 1I:box... [16:32:02] 10Operations, 10ops-eqiad: Offline uncorrectable sectors on poolcounter1002 /dev/sda - https://phabricator.wikimedia.org/T186534#3946536 (10fgiunchedi) [16:32:16] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186533#3946546 (10Marostegui) Thanks @Papaul - that was fast! Will close once it is completed ``` logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB,... [16:32:34] RECOVERY - Host mc2036.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.66 ms [16:32:51] 10Operations, 10ops-eqiad: OfflineUncorrectableSector on mw1256 sda - https://phabricator.wikimedia.org/T186535#3946547 (10fgiunchedi) [16:38:58] (03Draft1) 10Paladox: Gerrit: Set notedb configs to enable notedb [puppet] - 10https://gerrit.wikimedia.org/r/408298 [16:39:23] (03PS2) 10Paladox: Gerrit: Set notedb configs to enable notedb [puppet] - 10https://gerrit.wikimedia.org/r/408298 (https://phabricator.wikimedia.org/T174034) [16:40:43] 10Operations, 10ops-codfw: mc2036 mainboard fuse failure - https://phabricator.wikimedia.org/T185587#3946585 (10Papaul) HP will send a replacement main board for the system. Hello Papaul, If that's the case, I would be setting up an onsite service and recommending the system board to be replaced. Kindly con... [16:44:33] 10Operations, 10Puppet: Port puppetlabs PuppetDB 4.4 package to stretch - https://phabricator.wikimedia.org/T185502#3946597 (10herron) Packages have been built for stretch and working ok labs puppetdb testing instance. To be added to main apt repo at a later date. [16:57:04] elukey: thank you for fixing the phabricator (aphlict) log rotate [16:57:16] paladox: cc: ^ [16:57:22] thanks :) [17:00:34] mutante: no problem! I hope that the copytruncate is fine, we can add a postrotate rule in case something better is available (reload seems not be available from the systemd unit but I might be wrong) [17:04:59] elukey reload is avilable in systemd. [17:08:02] paladox: sure, but as far as I know needs to be specifically configured (and the daemon needs to know how to do it) [17:08:15] yep [17:22:19] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#3946656 (10RobH) a:05BBlack>03Cmjohnson Ok, this is ready to have the 4 new LVS systems racked, one per row. [17:22:41] 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#3946659 (10RobH) [17:28:20] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3925325 (10RobH) Why did we call this tendril2001 in codfw, but db1115 in eqiad? [17:30:49] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1115 (tendril replacement database) - https://phabricator.wikimedia.org/T185788#3946714 (10Marostegui) See discussion at T186123 and starting at: T185788#3940445 (basically when I created this ticket I didn't know it was being decided t... [17:32:03] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3946723 (10Marostegui) [17:32:42] !log mholloway-shell@tin Started deploy [mobileapps/deploy@d970b61]: Update mobileapps to 7a9fab3 [17:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:27] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@d970b61]: Update mobileapps to 7a9fab3 (duration: 05m 45s) [17:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:01] 10Operations, 10Analytics, 10Analytics-Cluster: Clean up permissions for privatedata files on stat1005 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#3946826 (10Nuria) [17:54:05] 10Operations, 10Analytics, 10Analytics-Cluster: Clean up permissions for privatedata files on stat1005 - they should be group readable by statistics-privatedata-users - https://phabricator.wikimedia.org/T89887#3946832 (10Milimetric) [17:56:15] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3946843 (10Dzahn) 05stalled>03Open [17:56:17] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3946844 (10Dzahn) [17:59:33] (03PS2) 10Dzahn: admin: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:00:07] gehel: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1800). [18:00:07] No GERRIT patches in the queue for this window AFAICS. [18:00:07] (03CR) 10jerkins-bot: [V: 04-1] admin: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:00:20] (03PS3) 10Dzahn: admin: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:00:35] jouncebot: actually, wdqs upgrade is coming up! [18:00:49] (03CR) 10jerkins-bot: [V: 04-1] admin: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:02:34] 10Operations, 10LDAP-Access-Requests: NDA request for Samtar - https://phabricator.wikimedia.org/T186344#3946856 (10RobH) Ok, we chatted about this in the SRE meeting. It seems your shell account, and thus your ldap account, had rights revoked due to regular account auditing. It seems at that time a number o... [18:04:17] (03PS4) 10Dzahn: admins: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:04:43] !log gehel@tin Started deploy [wdqs/wdqs@d7eb899]: wdqs blazegraph + gui + updater upgrade [18:04:43] (03PS5) 10Zoranzoki21: Enable ArticlePlaceholder extension for urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408043 (https://phabricator.wikimedia.org/T186451) (owner: 10محمد شعیب) [18:04:55] (03PS1) 10RobH: adding samtar to ldap users [puppet] - 10https://gerrit.wikimedia.org/r/408320 (https://phabricator.wikimedia.org/T186344) [18:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:10] (03PS1) 10Ayounsi: Assign IPs for cr1-ulsfo <--> cr1-eqsin [dns] - 10https://gerrit.wikimedia.org/r/408321 [18:05:16] (03PS2) 10RobH: adding samtar to ldap users [puppet] - 10https://gerrit.wikimedia.org/r/408320 (https://phabricator.wikimedia.org/T186344) [18:05:33] (03CR) 10Ayounsi: [C: 032] Assign IPs for cr1-ulsfo <--> cr1-eqsin [dns] - 10https://gerrit.wikimedia.org/r/408321 (owner: 10Ayounsi) [18:06:31] (03CR) 10RobH: [C: 032] "Huh, it worked with testing. I suppose the ldap section isn't parsed for inclusion on hosts, which makes perfect sense. I just didn't ex" [puppet] - 10https://gerrit.wikimedia.org/r/408320 (https://phabricator.wikimedia.org/T186344) (owner: 10RobH) [18:07:19] !log gehel@tin Finished deploy [wdqs/wdqs@d7eb899]: wdqs blazegraph + gui + updater upgrade (duration: 02m 36s) [18:07:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:09] SMalyshev: wdqs deployment completed, tests are green [18:08:43] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: NDA request for Samtar - https://phabricator.wikimedia.org/T186344#3946876 (10RobH) 05Open>03Resolved a:03RobH @samtar: Ok, I've added back the NDA flag to your ldap account! Let me know if you have any further questions. [18:09:02] (03PS5) 10Dzahn: admins: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:12:59] (03CR) 10Dzahn: [C: 032] admins: Add bstorm to production shell users [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:16:20] (03CR) 10Dzahn: [C: 032] "[bast1001:~] $ id bstorm" [puppet] - 10https://gerrit.wikimedia.org/r/405922 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:16:44] (03PS2) 10Dzahn: Add user bstorm to group ops [puppet] - 10https://gerrit.wikimedia.org/r/405923 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:18:02] !log ppchelko@tin Started deploy [restbase/deploy@55e9d87]: Enable ensure_content_type filter for mobile content [18:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:29] (03CR) 10Dzahn: [C: 032] Add user bstorm to group ops [puppet] - 10https://gerrit.wikimedia.org/r/405923 (https://phabricator.wikimedia.org/T185591) (owner: 10Madhuvishy) [18:21:54] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3946975 (10elukey) I placed `CoreDumpDirectory /srv/apa... [18:26:42] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3946987 (10madhuvishy) [18:27:16] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3946991 (10madhuvishy) [18:27:22] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10madhuvishy) 05Open>03Resolved a:03madhuvishy [18:28:43] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10madhuvishy) [18:28:47] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10Dzahn) ``` [bast1001:~] $ id bstorm uid=18713(bstorm) gid=500(wikidev) groups=500(wikidev),50(staff),700(ops),600(all-users) ``` [18:30:06] !log ppchelko@tin Finished deploy [restbase/deploy@55e9d87]: Enable ensure_content_type filter for mobile content (duration: 12m 04s) [18:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:07] !log added bstorm to the 'wmf' and 'ops' LDAP groups (modify-ldap-groups on terbium) (T185493) [18:31:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:20] T185493: Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493 [18:31:57] 10Operations, 10Analytics, 10Code-Stewardship-Reviews, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319#3912816 (10greg) Feedback also at https://www.mediawiki.org/wiki/Talk:Code_stewardship_reviews/Feedback_solicitation/IRCR... [18:32:22] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947022 (10madhuvishy) [18:33:06] ACKNOWLEDGEMENT - HP RAID on db2039 is CRITICAL: CRITICAL: Slot 0: Failed: 1I:1:8 - OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T186549 [18:33:10] 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T186549#3947031 (10ops-monitoring-bot) [18:33:40] 10Operations, 10Traffic, 10netops: Anycast recdns - https://phabricator.wikimedia.org/T186550#3947034 (10BBlack) p:05Triage>03Normal [18:35:05] !log welcome new root shell user bstorm [18:35:16] !log add 'ulimit -c unlimited' to /etc/default/apache2 to see if httpd's CoreDumpDirectory works properly on phab1001 [18:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:30] need also to restart apache, sorry for the trouble [18:37:09] !log added bstorm to acl*operations-team (project 29) on Phabricator (T185493) [18:37:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:21] T185493: Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493 [18:38:02] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947055 (10Dzahn) [18:38:46] ohh finally core limit removed [18:39:01] :) [18:41:54] (03PS1) 10Madhuvishy: onboarding: Add Bstorm as prod icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/408327 (https://phabricator.wikimedia.org/T185493) [18:42:12] of course now it doesn't segfault [18:42:20] of course not [18:42:33] and I don't know about extensions, i guess a phab admin might have to do it [18:42:40] * apergos eyes no_justification [18:42:58] !log ppchelko@tin Started deploy [restbase/deploy@44f2d2b]: Pass cache-control headers to /sys/mobileapps [18:43:07] Huh? [18:43:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:20] (03PS1) 10Madhuvishy: onboarding: Add bstorm to sms contact group [puppet] - 10https://gerrit.wikimedia.org/r/408328 (https://phabricator.wikimedia.org/T185493) [18:43:28] quick way to disable a phab extension [18:43:35] Idk [18:43:42] Probably "remove the code" [18:43:48] ugh [18:43:49] There's no extension UI or anything [18:43:56] seriously? [18:44:25] in some other life if I wrote php... [18:45:06] apergos per no_justification the only way is to remove the code [18:45:39] nice [18:45:42] 10Operations, 10Traffic, 10Accessibility, 10Browser-Support-Internet-Explorer: Wikipedia no longer accessible to those using some braille devices - https://phabricator.wikimedia.org/T185582#3947075 (10Volker_E) Just for completion, I haven't heard back from the company for the last two weeks on the email r... [18:46:54] AH00051: child pid 5019 exit signal Segmentation fault (11), possible coredump in /srv/apache2_dump [18:46:56] \o/ [18:47:18] ahahha and there's nothing in the dir [18:47:21] * elukey cries in a corner [18:48:07] 10Operations, 10Continuous-Integration-Infrastructure: legoktm can't deploy docker images on contint1001 - https://phabricator.wikimedia.org/T186475#3947087 (10thcipriani) I think we need to add other members of contint-admins to the contint-docker group to ensure that they are able to upload docker images cre... [18:48:13] but it goes to /var/tmp/core/ [18:49:41] uhhhhh [18:49:49] maybe you could sym link the dir :-P [18:50:20] there will be some reason it won't like that, I'm sure [18:52:19] /proc/sys/kernel/core_pattern needs to be set too [18:53:29] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3947103 (10elukey) Adding ulimits -c unlimited to /etc/... [18:57:40] !log ppchelko@tin Finished deploy [restbase/deploy@44f2d2b]: Pass cache-control headers to /sys/mobileapps (duration: 14m 42s) [18:57:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] addshore, hashar, anomie, no_justification, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Morning SWAT (Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1900). [19:00:04] MatmaRex and Zoranzoki21: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:05] (03CR) 10Thcipriani: [C: 031] "Cherry-picked on beta for the past week and change." [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [19:00:43] hi. i am here this time. [19:03:15] (03CR) 10Dzahn: [C: 031] "+1 but only after a contact "bstorm" has been created in contacts.cfg in private repo" [puppet] - 10https://gerrit.wikimedia.org/r/408328 (https://phabricator.wikimedia.org/T185493) (owner: 10Madhuvishy) [19:04:33] I am here [19:04:36] Sorry for late [19:04:49] Who doing this swat? [19:05:27] (03CR) 10Dzahn: [C: 031] "lgtm, matching "cn". this is about permissions to ACK Icinga issues and schedule downtimes etc" [puppet] - 10https://gerrit.wikimedia.org/r/408327 (https://phabricator.wikimedia.org/T185493) (owner: 10Madhuvishy) [19:05:34] !log executed 'echo '/srv/apache2_dump/core.%h.%e.%p.%t' > /proc/sys/kernel/core_pattern' on phab1001 - T182832 [19:05:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:45] T182832: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832 [19:05:47] so this should allow proper generation of coredumps in /srv and not in / [19:05:50] (03CR) 10Madhuvishy: [C: 032] onboarding: Add Bstorm as prod icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/408327 (https://phabricator.wikimedia.org/T185493) (owner: 10Madhuvishy) [19:05:52] where we have plenty of space [19:05:55] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3947117 (10BBlack) Meta-update since this is quite stalled out now. I'll try to line up all the explanatory bits here that are affecting proces... [19:06:49] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3947119 (10elukey) ``` elukey@phab1001:~$ sudo cat /pro... [19:07:34] hey [19:07:36] who doing swat? [19:07:47] jouncebot: next [19:07:47] In 0 hour(s) and 52 minute(s): (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T2000) [19:08:49] It showing me as current [19:09:14] jouncebot: now [19:09:14] For the next 0 hour(s) and 50 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T1900) [19:09:38] aha [19:09:40] for 50 minutes [19:09:40] ok [19:10:29] (03CR) 10Bstorm: [C: 032] onboarding: Add bstorm to sms contact group [puppet] - 10https://gerrit.wikimedia.org/r/408328 (https://phabricator.wikimedia.org/T185493) (owner: 10Madhuvishy) [19:11:09] ohhh nice now coredumps are ok on phab1001 [19:11:15] so nobody is swat-ing? [19:13:04] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [19:13:37] madhuvishy: ^ that will be the new user addition.. let's see [19:13:45] checking with "icinga -v" [19:13:54] huh [19:14:40] Error: Service notification period 'MST_awake_hours' specified for contact 'bstorm' [19:14:48] MST [19:14:50] hmmm [19:14:53] these periods are defined in timeperiods.cfg [19:14:56] in public repo [19:15:14] MST_awake_hours is probably not defined yet [19:15:24] mutante: yup [19:15:30] ./modules/nagios_common/files/timeperiods.cfg [19:16:03] maybe brooke wants to define that time period herself [19:16:24] PROBLEM - HHVM rendering on mw2128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:17:00] bstorm_: it seems MST is not a defined time period in our config yet, would you like to patch that? [19:17:08] modules/nagios_common/files/timeperiods.cfg in ops/puppet [19:17:12] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3947150 (10elukey) First hit for a similar stacktrace w... [19:17:12] PST will do [19:17:15] RECOVERY - HHVM rendering on mw2128 is OK: HTTP OK: HTTP/1.1 200 OK - 79540 bytes in 0.399 second response time [19:17:34] bstorm_: cool, i'll switch you - feel free to patch and then we can switch back if you want [19:17:35] (03PS6) 10Chad: Enable ArticlePlaceholder extension for urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408043 (https://phabricator.wikimedia.org/T186451) (owner: 10محمد شعیب) [19:17:36] I'm often in PST. [19:17:43] Sure thanks :) [19:17:43] (03CR) 10Chad: [C: 032] Enable ArticlePlaceholder extension for urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408043 (https://phabricator.wikimedia.org/T186451) (owner: 10محمد شعیب) [19:18:30] Since my favorite MST location is Arizona, it's kind of broken anyway and often in sync with PST (AZ has no daylight savings) [19:19:35] bstorm_: right okay :) [19:20:14] no_justification: BTW, you C+2'ed https://gerrit.wikimedia.org/r/#/c/407686/ on Friday but it didn't merge (or get deployed, I assume). [19:20:34] (03Merged) 10jenkins-bot: Enable ArticlePlaceholder extension for urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408043 (https://phabricator.wikimedia.org/T186451) (owner: 10محمد شعیب) [19:20:45] James_F: Cuz I realized it had a parent. [19:20:48] (03CR) 10jenkins-bot: Enable ArticlePlaceholder extension for urwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408043 (https://phabricator.wikimedia.org/T186451) (owner: 10محمد شعیب) [19:20:50] That I didn't wanna merge yet ;-) [19:21:02] no_justification: Yeah, I de-parented it. ;-) [19:22:15] I am confused [19:22:20] hey bstorm_ you are in here with a _, you want to change that to the name without a _? [19:22:27] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Enable ArticlePlaceholder extension for urwiki (duration: 00m 56s) [19:22:36] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947177 (10madhuvishy) [19:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:56] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [19:23:08] That is a result of my setting things up with irc cloud, and it tripping a little along the way. I've embraced the underscore since. [19:23:22] So I'm good with it :) [19:23:27] ah ok [19:23:29] sec then [19:24:45] (03CR) 10jenkins-bot: MWWikiversions::writeWikiVersionsFile: No need to support PHP 5.3 any more [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407686 (owner: 10Jforrester) [19:25:49] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3947179 (10Paladox) @elukey i have a patch to switch to... [19:26:06] !log demon@tin Synchronized multiversion/MWWikiversions.php: drop php5.3 support (duration: 00m 56s) [19:26:15] (03PS5) 10Zoranzoki21: Change namespaces on urwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/407901 (https://phabricator.wikimedia.org/T186393) [19:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:27:19] !log demon@tin Started scap: adding collation for Abkhaz [19:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:53] (03PS1) 10Madhuvishy: icinga: Add Arturo as prod icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/408340 (https://phabricator.wikimedia.org/T178807) [19:30:17] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 78, down: 1, dormant: 0, excluded: 0, unused: 0 [19:31:02] (03CR) 10Dzahn: [C: 031] "matches his "cn" in LDAP" [puppet] - 10https://gerrit.wikimedia.org/r/408340 (https://phabricator.wikimedia.org/T178807) (owner: 10Madhuvishy) [19:31:13] (03CR) 10Madhuvishy: [C: 032] icinga: Add Arturo as prod icinga contact [puppet] - 10https://gerrit.wikimedia.org/r/408340 (https://phabricator.wikimedia.org/T178807) (owner: 10Madhuvishy) [19:32:26] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 79, down: 0, dormant: 0, excluded: 0, unused: 0 [19:32:30] !log demon@tin Finished scap: adding collation for Abkhaz (duration: 05m 12s) [19:32:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:31] brion: scap approaching 5m mark ^ ;-) [19:33:51] nice1 [19:34:12] (03CR) 10Odder: "This patch has introduced a regression - the 1.5x and 2x logos for the Urdu Wikipedia use the pre-2010 globe. Could we have this fixed, pl" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341993 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [19:34:13] I would love the 1 minute scap for non-localization stuff [19:34:18] maybe that's a pipe dream [19:34:40] It's getting there.... [19:34:59] 19:29:30 Started scap-cdb-rebuild [19:35:00] scap-cdb-rebuild: 100% (ok: 295; fail: 0; left: 0) [19:35:00] 19:32:24 Finished scap-cdb-rebuild (duration: 02m 53s) [19:35:04] There's the bulk of it ^ [19:35:12] wow [19:35:15] Drop the cdbs, we've got like 1-2 min scaps [19:35:20] so if we could get that down to a few seconds somehow [19:36:49] (03PS2) 10Chad: Set wgCategoryCollation for abwiki (Abkhaz Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406185 (https://phabricator.wikimedia.org/T183430) (owner: 10Bartosz Dziewoński) [19:36:54] (03CR) 10Chad: [C: 032] Set wgCategoryCollation for abwiki (Abkhaz Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406185 (https://phabricator.wikimedia.org/T183430) (owner: 10Bartosz Dziewoński) [19:37:18] no_justification: wait, did we deploy the other one already? [19:37:21] this one depends on that [19:37:40] That's what the scap was for :) [19:37:46] ok [19:38:07] i wasn't watching the channel, just waiting for pings [19:39:45] :) [19:40:11] (03Merged) 10jenkins-bot: Set wgCategoryCollation for abwiki (Abkhaz Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406185 (https://phabricator.wikimedia.org/T183430) (owner: 10Bartosz Dziewoński) [19:40:37] (03CR) 10jenkins-bot: Set wgCategoryCollation for abwiki (Abkhaz Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406185 (https://phabricator.wikimedia.org/T183430) (owner: 10Bartosz Dziewoński) [19:40:56] no_justification: this needs the updateCollation.php script run. you probably know this, but i'm saying just in case. [19:42:26] PROBLEM - Nginx local proxy to apache on mw2123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:13] MatmaRex: Yep, thanks :) [19:43:16] RECOVERY - Nginx local proxy to apache on mw2123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.206 second response time [19:44:21] 10Operations, 10Traffic, 10Accessibility, 10Browser-Support-Internet-Explorer: Wikipedia no longer accessible to those using some braille devices - https://phabricator.wikimedia.org/T185582#3947244 (10Cameron11598) @TheDJ No problem! I think accessibility is important so I tend to grab these tickets when... [19:44:25] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947245 (10chasemp) [19:44:49] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10chasemp) [19:44:49] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: collation for abwiki (duration: 00m 55s) [19:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:19] (03PS13) 10Andrew Bogott: openstack horizon: rough in manifests for source deploy of Horizon 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/406853 (https://phabricator.wikimedia.org/T168470) [19:45:57] (03CR) 10jerkins-bot: [V: 04-1] openstack horizon: rough in manifests for source deploy of Horizon 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/406853 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [19:46:21] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3917465 (10chasemp) [19:46:53] (03PS14) 10Andrew Bogott: openstack horizon: rough in manifests for source deploy of Horizon 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/406853 (https://phabricator.wikimedia.org/T168470) [19:59:38] (03PS1) 10Jdlrobson: Update the ps mobile wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408348 (https://phabricator.wikimedia.org/T184442) [20:00:01] MatmaRex: 6452 rows processed [20:00:04] tgr: My dear minions, it's time we take the moon! Just kidding. Time for deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:45] no_justification: yay thanks [20:01:17] (03PS2) 10Jdlrobson: Update the ps mobile wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408348 (https://phabricator.wikimedia.org/T184442) [20:03:00] (03CR) 10Zoranzoki21: [C: 031] Update the ps mobile wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408348 (https://phabricator.wikimedia.org/T184442) (owner: 10Jdlrobson) [20:03:14] (03PS3) 10Jdlrobson: Update the ps mobile wordmark [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408348 (https://phabricator.wikimedia.org/T184442) [20:08:13] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947345 (10chasemp) [20:09:47] (03PS1) 10Jdlrobson: Configure settings feedback link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408354 (https://phabricator.wikimedia.org/T182217) [20:13:00] (03CR) 10Chad: "We've only got a ~60% cache hit rate on groups, yeah we should double this. Same with groups_byuuid, while we're here." [puppet] - 10https://gerrit.wikimedia.org/r/407927 (owner: 10Paladox) [20:13:28] no_justification is double 4048? [20:13:49] What's the default? [20:13:53] 1024? [20:14:28] loading groups since 2 minutes ago, no results yet [20:14:50] i will have to look at the docs :) [20:15:04] in a sec. /me is backporting the disable "private change feature" [20:15:17] which will need someone from wmf to say that it's a wmf feature request. [20:16:11] disabling private changes? :( [20:16:51] Hauskatze gerrit 2.15 removes drafts and split's it into two features [20:16:54] wip and private changes [20:17:56] no_justification https://gerrit-review.googlesource.com/#/c/gerrit/+/157390/ [20:18:42] I had to do that all manual (copy and pasting) :) [20:19:29] is it possible to set a puppet rule to disable all mail comming from an specific IP? [20:19:38] 2300 emails from it [20:19:47] it's a bit... annoying [20:20:14] Hauskatze nope, that's actually intended [20:20:17] that bug was fixed in 2.13 [20:20:28] but with notedb introduction in 2.14, it works differently. [20:20:30] paladox: I meant for OTRS [20:20:34] oh i see [20:23:31] (03PS1) 10Gergő Tisza: Enable AICaptcha data collection on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408360 (https://phabricator.wikimedia.org/T186244) [20:27:48] no_justification oh [20:27:51] it's unlimited [20:27:52] "groups": default is unlimited [20:28:23] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cache [20:30:50] Then....Hmm [20:30:55] Then you're putting a cap on it ;-) [20:31:06] All we'd want is loadOnStartup, but that doesn't seem to be the problem here. [20:31:13] heh just realised :) [20:31:14] (we've been started-up for 3 days) [20:31:43] (03PS1) 10Gergő Tisza: Enable AICaptcha data collection on group0/group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408363 (https://phabricator.wikimedia.org/T186244) [20:31:45] (03PS1) 10Gergő Tisza: Enable AICaptcha data collection everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408364 (https://phabricator.wikimedia.org/T186244) [20:32:05] Apparently the slowness may be fixed in another release. I did request someone performance improvements in https://bugs.chromium.org/p/gerrit/issues/detail?id=8278 :) [20:45:08] (03CR) 10Gergő Tisza: [C: 032] Enable AICaptcha data collection on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408360 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [20:45:13] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3947432 (10Ottomata) Thanks @bblack, it's at least good to know that we'll need to do the IPSec thing or this will block us for a long while. I... [20:46:48] (03Merged) 10jenkins-bot: Enable AICaptcha data collection on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408360 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [20:48:41] !log tgr@tin Started scap: T186244 backporting patches and enabling AICaptcha data collection on testwiki [20:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:56] T186244: Deploy AICaptcha data collection - https://phabricator.wikimedia.org/T186244 [20:50:05] (03CR) 10jenkins-bot: Enable AICaptcha data collection on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408360 (https://phabricator.wikimedia.org/T186244) (owner: 10Gergő Tisza) [20:54:07] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [20:57:25] hmm [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: My dear minions, it's time we take the moon! Just kidding. Time for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T2100). [21:00:05] No GERRIT patches in the queue for this window AFAICS. [21:00:24] I'm overrunning my window a bit [21:00:41] MediaWiki-only though so I don't think there's any interference [21:07:05] !log tgr@tin Finished scap: T186244 backporting patches and enabling AICaptcha data collection on testwiki (duration: 18m 24s) [21:07:07] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [21:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:18] T186244: Deploy AICaptcha data collection - https://phabricator.wikimedia.org/T186244 [21:09:42] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#3947552 (10chasemp) ping @robh it seemed like you had some ideas here during the meeting today, could you coordinate with @andrew on our side if we can help? [21:12:38] (done) [21:14:47] (03PS1) 10Groovier1: Adding config for WikimediaEvents module for logging behaviour data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408381 (https://phabricator.wikimedia.org/T186244) [21:20:06] PROBLEM - puppet last run on mw1336 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:22:57] PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:25:28] (03PS3) 10Mobrovac: [JobQueue] Enable htmlCacheUpdate on new infrastructure for all projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404598 (https://phabricator.wikimedia.org/T182023) (owner: 10Ppchelko) [21:27:14] (03PS1) 10Groovier1: Adding config for WikimediaEvents module for logging behaviour data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/408386 (https://phabricator.wikimedia.org/T186244) [21:27:16] PROBLEM - configured eth on labtestnet2002 is CRITICAL: eth1 reporting no carrier. [21:30:47] 10Operations, 10Gerrit, 10Patch-For-Review, 10Performance: New gerrit login ui is causing performance problems when going through gerrit.wikimedia.org - https://phabricator.wikimedia.org/T185506#3947615 (10demon) >>! In T185506#3920908, @Krinkle wrote: > Perhaps ask them why they wanted it this way, and we... [21:33:03] 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install labvirt102[12] - https://phabricator.wikimedia.org/T183937#3947621 (10RobH) So, it seems these were not ordered with the right kind of network card. Ideally, we keep the onboard 4x1GB and add a second broadcom dual port 10G. However, these t... [21:39:30] fyi, Pchelolo and I will be taking over for the next 20 mins [21:39:43] (03CR) 10Mobrovac: [C: 032] [JobQueue] Enable htmlCacheUpdate on new infrastructure for all projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404598 (https://phabricator.wikimedia.org/T182023) (owner: 10Ppchelko) [21:41:10] !log mholloway-shell@tin Started deploy [mobileapps/deploy@6cae404]: Update mobileapps to 3140b1a [21:41:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:41] (03Merged) 10jenkins-bot: [JobQueue] Enable htmlCacheUpdate on new infrastructure for all projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404598 (https://phabricator.wikimedia.org/T182023) (owner: 10Ppchelko) [21:43:57] (03CR) 10jenkins-bot: [JobQueue] Enable htmlCacheUpdate on new infrastructure for all projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404598 (https://phabricator.wikimedia.org/T182023) (owner: 10Ppchelko) [21:45:09] !log ppchelko@tin Started deploy [cpjobqueue/deploy@aebfded]: Enble htmlCacheUpdate job for all wikis T182023 [21:45:16] !log asw-b-codfw# rollback 0 pending questions on T183167 [21:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:22] T182023: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023 [21:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:45:33] T183167: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167 [21:47:08] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3947675 (10RobH) a:05RobH>03Papaul Ok, so what is on this task doesn't match what is on the switch stack. Unfortuantely, I don't recall if we ever set t... [21:47:35] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@aebfded]: Enble htmlCacheUpdate job for all wikis T182023 (duration: 02m 27s) [21:47:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:48] !log mholloway-shell@tin Finished deploy [mobileapps/deploy@6cae404]: Update mobileapps to 3140b1a (duration: 06m 38s) [21:47:56] PROBLEM - Check systemd state on scb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:47:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:06] RECOVERY - puppet last run on mw1336 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:52:09] systemd state on scb2001 known ^ [21:52:24] working on it [21:53:06] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:00:05] bawolff and Reedy: (Dis)respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180205T2200). Please do the needful. [22:00:05] No GERRIT patches in the queue for this window AFAICS. [22:01:35] well if Reedy is around there's a security patch for an extension which needs to be uploaded and merged [22:01:39] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3947692 (10chasemp) I see a PGP key for bstorm@wikimedia.org. Let's sign during teh team meeting and f2f tomorrow :) [22:14:09] 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install labnodepool1002.eqiad.wmnet - https://phabricator.wikimedia.org/T168407#3947728 (10chasemp) @greg can you weigh in here? We are attempting to reallocate this if you don't object. [22:25:41] (03PS1) 10Chad: Adding gitiles.jar @ stable-2.14 [software/gerrit] - 10https://gerrit.wikimedia.org/r/408437 [22:26:12] (03CR) 10Paladox: [C: 031] "tested @ https://gerrit.git.wmflabs.org/r/plugins/gitiles/ and works :)" [software/gerrit] - 10https://gerrit.wikimedia.org/r/408437 (owner: 10Chad) [22:36:32] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/9860/ but still compiling" [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:37:43] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3947756 (10RobH) a:05Cmjohnson>03RobH >>! In T184832#3905347, @Marostegui wrote: > I believe this is now ready for @Cm... [22:42:21] !log ppchelko@tin Started deploy [cpjobqueue/deploy@4543102]: Revert the switch to librdkafka 0.11 and enable htmlCacheUpdate [22:42:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:15] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@4543102]: Revert the switch to librdkafka 0.11 and enable htmlCacheUpdate (duration: 00m 54s) [22:43:16] PROBLEM - Check systemd state on labstore1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:43:16] RECOVERY - Check systemd state on scb2001 is OK: OK - running: The system is fully operational [22:43:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:07] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3947763 (10RobH) [22:45:18] !log mobrovac@tin Synchronized wmf-config/jobqueue.php: EventBus: Enable htmlCacheUpdate jobs for all projects - T182023 (duration: 00m 56s) [22:45:22] labstore1007 alerts madhuvishy ^ [22:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:32] T182023: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023 [22:45:32] chasemp: yep I got it [22:46:16] RECOVERY - Check systemd state on labstore1007 is OK: OK - running: The system is fully operational [22:46:25] !log mobrovac@tin Synchronized wmf-config/InitialiseSettings.php: EventBus: Enable htmlCacheUpdate jobs for all projects - T182023 (duration: 00m 55s) [22:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:48:39] (03PS1) 10Bstorm: Add MST to timeperiods.cfg for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/408445 (https://phabricator.wikimedia.org/T185493) [22:50:10] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3947771 (10RobH) So while these appear to not be in use, the puppet repo was NOT cleared of references before escalation t... [22:50:39] (03CR) 10Madhuvishy: [C: 031] "Looks good to me, +1 to merge :)" [puppet] - 10https://gerrit.wikimedia.org/r/408445 (https://phabricator.wikimedia.org/T185493) (owner: 10Bstorm) [22:51:09] (03PS1) 10RobH: decom of labsdb100[13] [puppet] - 10https://gerrit.wikimedia.org/r/408446 (https://phabricator.wikimedia.org/T184832) [22:51:39] (03CR) 10RobH: [C: 032] decom of labsdb100[13] [puppet] - 10https://gerrit.wikimedia.org/r/408446 (https://phabricator.wikimedia.org/T184832) (owner: 10RobH) [22:51:46] PROBLEM - configured eth on labtestmetal2001 is CRITICAL: eth1 reporting no carrier. [22:52:49] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3947773 (10chasemp) > labtestvirt2003:eth2 is also in the list above [22:53:03] (03PS1) 10RobH: decom of labsdb100[13] production dns [dns] - 10https://gerrit.wikimedia.org/r/408448 (https://phabricator.wikimedia.org/T184832) [22:53:38] (03CR) 10RobH: [C: 032] decom of labsdb100[13] production dns [dns] - 10https://gerrit.wikimedia.org/r/408448 (https://phabricator.wikimedia.org/T184832) (owner: 10RobH) [22:53:47] (03CR) 10Madhuvishy: [C: 031] "One small note, we usually prefix commit messages with a one word relevant subject, for example icinga: Add MST to timeperiods.cfg" [puppet] - 10https://gerrit.wikimedia.org/r/408445 (https://phabricator.wikimedia.org/T185493) (owner: 10Bstorm) [22:55:13] 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3947784 (10RobH) [22:55:54] 10Operations, 10ops-eqiad, 10hardware-requests, 10cloud-services-team (Kanban): Decommission labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T184832#3897875 (10RobH) a:05RobH>03Cmjohnson Ok, now ready for onsite wipe. it looks like labsdb1003 may also have a disk shelf, please ensure all... [22:58:44] (03PS2) 10Bstorm: icinga: Add MST to timeperiods.cfg for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/408445 (https://phabricator.wikimedia.org/T185493) [23:02:20] (03CR) 10Bstorm: [C: 032] icinga: Add MST to timeperiods.cfg for monitoring [puppet] - 10https://gerrit.wikimedia.org/r/408445 (https://phabricator.wikimedia.org/T185493) (owner: 10Bstorm) [23:03:09] (03CR) 10Chad: "Per IRC: the cache size is already set to unlimited, so there's not a ton to be gained here (if anything, we're capping the results). It s" [puppet] - 10https://gerrit.wikimedia.org/r/407927 (owner: 10Paladox) [23:03:31] (03Abandoned) 10Paladox: Gerrit: Cache groups [puppet] - 10https://gerrit.wikimedia.org/r/407927 (owner: 10Paladox) [23:04:27] !log mobrovac@tin Started deploy [citoid/deploy@7bbc583]: Fix TypeError bug - T186395 [23:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:04:40] T186395: unable to make citation for http://apps.who.int/iris/handle/10665/70863 - https://phabricator.wikimedia.org/T186395 [23:06:06] I pushed up a change to puppet to add a timezone to Icinga. I'll do a puppet-merge if that's cool here. [23:07:55] !log mobrovac@tin Finished deploy [citoid/deploy@7bbc583]: Fix TypeError bug - T186395 (duration: 03m 29s) [23:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:08:55] 10Operations, 10ops-codfw, 10Cloud-VPS: Connect labtestvirt2003 eth1 and eth2 interface(s) to switch fabric - https://phabricator.wikimedia.org/T183167#3947803 (10RobH) >>! In T183167#3947773, @chasemp wrote: >> labtestvirt2003:eth2 > > is also in the list above Indeed, I've corrected my comment to list th... [23:09:01] Going ahead with it. [23:13:57] PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:14:16] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:14:16] PROBLEM - puppet last run on labvirt1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:14:36] PROBLEM - puppet last run on netmon1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:15:16] PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:15:17] PROBLEM - puppet last run on db1081 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:15:17] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:15:17] PROBLEM - puppet last run on mw1268 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:15:17] PROBLEM - puppet last run on mw1290 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:16:06] PROBLEM - puppet last run on darmstadtium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:17:39] PROBLEM - puppet last run on kafka1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:17:46] PROBLEM - puppet last run on mw1273 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:06] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:06] PROBLEM - puppet last run on mw1269 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:06] PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:16] PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:31] puppetdb ^^? [23:18:36] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:46] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:46] PROBLEM - puppet last run on mc1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:57] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:19:19] PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:19:19] PROBLEM - puppet last run on dumpsdata1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:19:26] PROBLEM - puppet last run on hydrogen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:19:53] herron ^^ [23:21:15] !log nihal - restarted puppetdb service [23:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:18] puppet run on hydrogen worked [23:23:06] PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:23:07] Good night. [23:23:17] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:23:46] RECOVERY - puppet last run on mc1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:23:59] paladox: yea [23:24:06] thanks :) [23:24:17] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:24:26] PROBLEM - puppet last run on elastic2036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:24:26] RECOVERY - puppet last run on hydrogen is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:24:26] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:26:47] PROBLEM - puppet last run on elastic2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:26:57] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3947861 (10Pchelolo) 05Open>03Resolved Seems like the migration is complete with no issues. Resolving [23:28:17] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:29:17] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:37:40] (03CR) 10Chad: [C: 031] "This can land whenever (and accompanying private repo value can be dropped)" [puppet] - 10https://gerrit.wikimedia.org/r/407932 (owner: 10Paladox) [23:42:40] (03PS4) 10Dzahn: Gerrit: Remove certificate params [puppet] - 10https://gerrit.wikimedia.org/r/407932 (owner: 10Paladox) [23:42:46] RECOVERY - puppet last run on mw1273 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:43:06] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [23:43:06] RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:43:16] RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:43:36] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:43:46] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:43:56] RECOVERY - puppet last run on kubernetes1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:43:56] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:16] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:16] RECOVERY - puppet last run on labvirt1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:16] RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:16] RECOVERY - puppet last run on dumpsdata1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:27] (03CR) 10Dzahn: [C: 032] Gerrit: Remove certificate params [puppet] - 10https://gerrit.wikimedia.org/r/407932 (owner: 10Paladox) [23:44:36] RECOVERY - puppet last run on netmon1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:44:38] thanks :) [23:45:16] RECOVERY - puppet last run on mw1299 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:45:17] RECOVERY - puppet last run on db1081 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:45:17] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:45:17] RECOVERY - puppet last run on mw1290 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:45:17] RECOVERY - puppet last run on mw1268 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:46:06] RECOVERY - puppet last run on darmstadtium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:47:06] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 78, down: 1, dormant: 0, excluded: 0, unused: 0 [23:47:37] RECOVERY - puppet last run on kafka1020 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:48:06] RECOVERY - puppet last run on mw1269 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:48:06] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 79, down: 0, dormant: 0, excluded: 0, unused: 0 [23:48:27] (03CR) 10Dzahn: [C: 032] "deleted the line from the private repo" [puppet] - 10https://gerrit.wikimedia.org/r/407932 (owner: 10Paladox) [23:48:31] :) [23:51:47] RECOVERY - puppet last run on elastic2019 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:53:06] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:54:17] RECOVERY - puppet last run on elastic2036 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:54:26] RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures