[00:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T0000). [00:00:04] niedzielski: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:28] o/ [00:00:35] (03PS1) 10Cwhite: icinga: copy stretch options to icinga2001 [puppet] - 10https://gerrit.wikimedia.org/r/473642 (https://phabricator.wikimedia.org/T208824) [00:01:53] niedzielski: I can SWAT. [00:02:04] (03CR) 10Niharika29: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473221 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [00:02:09] hey Niharika ! that'd be great! [00:03:29] (03Merged) 10jenkins-bot: Prod: increase Schema.org page split test to 5% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473221 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [00:03:48] (03CR) 10Dzahn: [C: 031] icinga: copy stretch options to icinga2001 [puppet] - 10https://gerrit.wikimedia.org/r/473642 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [00:04:24] (03CR) 10Cwhite: [C: 032] icinga: copy stretch options to icinga2001 [puppet] - 10https://gerrit.wikimedia.org/r/473642 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [00:04:48] niedzielski: It's on mwdebug1002. [00:05:02] awesome. i'll start testing now. [00:07:09] i am seeing the change and evaluating. [00:07:44] (03CR) 10jenkins-bot: Prod: increase Schema.org page split test to 5% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473221 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [00:07:51] niedzielski: Cool. Ping me know when done. [00:08:28] Niharika: ok thanks. I think I'll need about 15-20 minutes if that's alright. [00:08:45] (03PS1) 10Paladox: Gerrit: Support git protocol version 2 [puppet] - 10https://gerrit.wikimedia.org/r/473643 [00:08:46] niedzielski: Sure. You're the only client. :) [00:08:56] 👍 [00:10:25] (03PS2) 10Paladox: Gerrit: Support git protocol version 2 [puppet] - 10https://gerrit.wikimedia.org/r/473643 [00:12:13] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [00:20:15] (03PS1) 10Papaul: DNS: Add production and mgmt DNS entries for ms-be200[4-9] and ms-be2050 [dns] - 10https://gerrit.wikimedia.org/r/473646 (https://phabricator.wikimedia.org/T209395) [00:26:57] Niharika: ok, i think we're good to go. [00:29:14] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 (10Papaul) [00:32:57] (03CR) 10Dzahn: [C: 04-1] DNS: Add production and mgmt DNS entries for ms-be200[4-9] and ms-be2050 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/473646 (https://phabricator.wikimedia.org/T209395) (owner: 10Papaul) [00:33:08] 10Operations, 10New-Readers, 10Patch-For-Review: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10jrbs) >>! In T207816#4744216, @atgo wrote: > @Dzahn just commented on the wrong task, whoops. Reopening because: > > @Dzahn sometimes I'm getting a Bugzilla page instead... [00:34:17] (03PS2) 10Thcipriani: Gerrit: add basic robots.txt for proxy [puppet] - 10https://gerrit.wikimedia.org/r/473638 (https://phabricator.wikimedia.org/T209456) [00:35:00] 10Operations, 10New-Readers, 10Patch-For-Review: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10Dzahn) @jrbs Oh yea, it's fixed. Just failed to comment here and did instead on T202592#4744207 [00:35:25] 10Operations, 10New-Readers, 10Patch-For-Review: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10Dzahn) 05Open>03Resolved It was actually fixed by Brandon by just restarting one of the 2 backend servers. [00:48:14] niedzielski: Sorry, missed your message. [00:48:16] Syncing now... [00:48:27] np, sounds good! [00:50:02] !log niharika29@deploy1001 Synchronized wmf-config/InitialiseSettings.php: increase Schema.org page split test to 5% sampling T208755 (duration: 00m 54s) [00:50:04] niedzielski: Done. ^ [00:50:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:50:06] T208755: Launch A/B test for sameAs property - https://phabricator.wikimedia.org/T208755 [00:50:39] Niharika: thanks, I'm checking things now. Should take me a bit. [00:50:46] Sure. [00:51:13] (03CR) 1020after4: [C: 031] Gerrit: add basic robots.txt for proxy [puppet] - 10https://gerrit.wikimedia.org/r/473638 (https://phabricator.wikimedia.org/T209456) (owner: 10Thcipriani) [00:53:41] I see the changes and am testing. [00:55:43] (03PS1) 10Bstorm: sonofgridengine: reworking bastion for stretch and docker [puppet] - 10https://gerrit.wikimedia.org/r/473647 (https://phabricator.wikimedia.org/T200557) [01:00:04] twentyafterfour: Time to snap out of that daydream and deploy Phabricator update. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T0100). [01:09:48] Niharika: everything seems to be in order. thank you for your help, Niharika ! [01:09:59] niedzielski: Sure thing. [01:40:27] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:41:35] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [01:48:23] (03PS1) 10Kosta Harlan: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) [01:49:09] (03PS2) 10Kosta Harlan: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) [01:50:29] (03PS3) 10Kosta Harlan: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) [02:14:18] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Imarlier) Hrm. In that case, very likely that you're right, and what I'm seeing is the retry. Out of curiosity, have you examined GC behavior aro... [02:24:17] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.26 seconds [03:22:07] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 673.97 seconds [04:11:59] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 176.15 seconds [05:11:13] RECOVERY - ensure kvm processes are running on labvirt1015 is OK: PROCS OK: 2 processes with regex args /usr/bin/kvm [05:47:10] (03PS1) 10Marostegui: pc2008.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/473662 (https://phabricator.wikimedia.org/T208383) [05:49:57] RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [05:51:46] (03PS1) 10Marostegui: db-codfw.php: Add pc2008 to pc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473665 (https://phabricator.wikimedia.org/T208383) [05:54:11] (03CR) 10Marostegui: [C: 032] pc2008.yaml: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/473662 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [05:56:29] (03CR) 10Marostegui: [C: 032] db-codfw.php: Add pc2008 to pc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473665 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [05:57:54] (03Merged) 10jenkins-bot: db-codfw.php: Add pc2008 to pc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473665 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [05:59:07] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Pool pc2008 in pc2 - T208383 (duration: 00m 56s) [05:59:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:10] T208383: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 [05:59:27] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Marostegui) pc2008 is now online for pc2. [05:59:41] 10Operations, 10DBA, 10Patch-For-Review, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Marostegui) [06:01:57] (03PS1) 10Marostegui: db-codfw.php: Depool pc2006 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473666 (https://phabricator.wikimedia.org/T208383) [06:03:23] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool pc2006 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473666 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [06:04:47] (03Merged) 10jenkins-bot: db-codfw.php: Depool pc2006 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473666 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [06:05:03] !log marostegui@deploy1001 sync-file aborted: Dool pc2006 - T208383 (duration: 00m 00s) [06:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:05:06] T208383: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 [06:06:04] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool pc2006 - T208383 (duration: 00m 53s) [06:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:19] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:06:31] !log Stop MySQL on pc2006 to clone pc2009 - T208383 [06:06:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:46] (03CR) 10jenkins-bot: db-codfw.php: Add pc2008 to pc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473665 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [06:06:48] (03CR) 10jenkins-bot: db-codfw.php: Depool pc2006 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473666 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [06:11:55] RECOVERY - MariaDB Slave Lag: x1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [06:16:35] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:20:57] 10Operations, 10Math, 10Mathoid: Default $wgMathMathMLUrl points to non-working mathoid.testme.wmflabs.org - https://phabricator.wikimedia.org/T209563 (10edwardspec) [06:25:33] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Marostegui) >>! In T200297#4745149, @awight wrote: > @Marostegui Pinging for review of these two files, h... [06:30:43] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/mysql-ps1.sh] [06:51:21] (03PS1) 10Marostegui: db-codfw.php: Depool db2071 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473667 [06:53:07] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:53:07] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) This host needs to be decommed relatively soon, there are others that showed this kind kind of behavior in the range 28-42. [06:55:08] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2071 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473667 (owner: 10Marostegui) [06:56:12] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2071 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473667 (owner: 10Marostegui) [06:56:21] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:57:23] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db2071 (duration: 00m 54s) [06:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:32] !log Stop MySQL on db2071 to upgrade MySQL and kernel [06:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:03:02] (03PS1) 10Elukey: Apply -R 200 to memcached on mc1019 [puppet] - 10https://gerrit.wikimedia.org/r/473669 (https://phabricator.wikimedia.org/T208844) [07:05:15] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/13505/" [puppet] - 10https://gerrit.wikimedia.org/r/473669 (https://phabricator.wikimedia.org/T208844) (owner: 10Elukey) [07:05:33] (03CR) 10Elukey: [C: 032] Apply -R 200 to memcached on mc1019 [puppet] - 10https://gerrit.wikimedia.org/r/473669 (https://phabricator.wikimedia.org/T208844) (owner: 10Elukey) [07:05:49] (03CR) 10jenkins-bot: db-codfw.php: Depool db2071 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473667 (owner: 10Marostegui) [07:06:59] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:08:23] !log memcached on mc1019 restarted to apply -R 200 - T208844 [07:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:27] T208844: Apply -R 200 to all the memcached mw object cache instances running in eqiad/codfw - https://phabricator.wikimedia.org/T208844 [07:10:30] (03PS1) 10Urbanecm: Change sitename of shnwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473670 (https://phabricator.wikimedia.org/T206777) [07:19:21] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational [07:29:33] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:39:09] (03PS1) 10Elukey: Apply a hadoop config override to analytics1039 [puppet] - 10https://gerrit.wikimedia.org/r/473684 (https://phabricator.wikimedia.org/T208706) [07:41:20] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13506/analytics1039.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/473684 (https://phabricator.wikimedia.org/T208706) (owner: 10Elukey) [07:41:57] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [07:42:30] !log Drop site_stats.ss_total_views from labswiki - T86339 [07:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:33] T86339: Dropping site_stats.ss_total_views on wmf databases - https://phabricator.wikimedia.org/T86339 [07:45:27] (03CR) 10Filippo Giunchedi: [C: 031] Remove Diamond from Elastic hosts [puppet] - 10https://gerrit.wikimedia.org/r/473561 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [07:50:53] (03PS1) 10Elukey: Apply an override to analytics1039 - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/473688 (https://phabricator.wikimedia.org/T208706) [07:51:38] (03CR) 10Elukey: [C: 032] Apply an override to analytics1039 - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/473688 (https://phabricator.wikimedia.org/T208706) (owner: 10Elukey) [07:54:38] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, 10Patch-For-Review: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T208706 (10elukey) 05Open>03Resolved a:03elukey The host will be decommed so no point in getting a new disk :) [07:57:48] (03PS2) 10Muehlenhoff: Remove Diamond from Kubernetes hosts [puppet] - 10https://gerrit.wikimedia.org/r/473490 (https://phabricator.wikimedia.org/T183454) [08:07:21] (03PS22) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) [08:07:25] (03PS17) 10Mathew.onipe: elasticsearch: cookbook for multi-cluster services rolling restart [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) [08:11:12] (03PS2) 10Muehlenhoff: Remove Diamond from Elastic hosts [puppet] - 10https://gerrit.wikimedia.org/r/473561 (https://phabricator.wikimedia.org/T183454) [08:13:45] (03CR) 10Muehlenhoff: [C: 032] Remove Diamond from Elastic hosts [puppet] - 10https://gerrit.wikimedia.org/r/473561 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:18:12] 10Operations, 10Maps, 10Discovery-Search (Current work): Update SQL location script for osm-initial-import - https://phabricator.wikimedia.org/T209566 (10Mathew.onipe) [08:18:16] 10Operations, 10Maps, 10Discovery-Search (Current work): Update SQL location script for osm-initial-import - https://phabricator.wikimedia.org/T209566 (10Mathew.onipe) p:05Triage>03Normal [08:21:03] !log draining ganeti1004 for reboot/kernel security update [08:21:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:20] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10elukey) a:05elukey>03Cmjohnson [08:37:23] PROBLEM - Check systemd state on elastic1035 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:38:10] 10Operations, 10Gerrit, 10Release-Engineering-Team (Watching / External): Add prometheus exporter to Gerrit - https://phabricator.wikimedia.org/T184086 (10fgiunchedi) Thanks @thcipriani for working on this! From a quick look it seems javamelody offers a deeper look into the jvm rather than pure JMX and expos... [08:39:19] (03PS4) 10Ema: fifo-log-demux 0.1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/473432 (https://phabricator.wikimedia.org/T204225) [08:39:26] 10Operations: rsync puppet module doesn't delete removed config - https://phabricator.wikimedia.org/T205618 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [08:41:47] PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 6 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[diamond],Package[python-diamond] [08:43:09] (03PS1) 10Tarrow: Set Wikibase string-limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) [08:45:16] !log depooling db1088 due a schema change (T85757) [08:45:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:19] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [08:45:29] (03CR) 10Banyek: [C: 032] mariadb: depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473539 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:45:48] (03CR) 10jenkins-bot: mariadb: depool db1088 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473539 (https://phabricator.wikimedia.org/T85757) (owner: 10Banyek) [08:45:49] PROBLEM - Check systemd state on elastic1049 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:46:14] (03CR) 10Ema: [V: 032 C: 032] fifo-log-demux 0.1 [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/473432 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [08:47:21] * onimisionipe is looking into elastic1035 and 1049 [08:48:07] (03PS2) 10Tarrow: Set Wikibase string-limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) [08:48:35] (03CR) 10Tarrow: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [08:49:01] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: depool db1088 (duration: 00m 53s) [08:49:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:36] PROBLEM - puppet last run on elastic1049 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 6 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[diamond],Package[python-diamond] [08:50:36] moritzm: [08:50:52] ^ [08:50:59] moritzm: ^ [08:52:23] deleting 8000 revisions failed with replica lags stuff, I thought that stuff got some fixes? [08:52:36] uh [08:52:44] and then marked as deleted >_> [08:56:47] !log Deploy schema change on db1088 (T85757) [08:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:50] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [08:57:01] onimisionipe: looking, these should resolve itself [08:58:07] it's a bug in the Diamond package in stretch which affects the removal of the package, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=910787 [08:58:08] RECOVERY - Check systemd state on elastic1049 is OK: OK - running: The system is fully operational [08:58:23] a second puppet run fixes it, it only happens for a few hosts, not in general [08:58:34] some puppet race it also involved I guess [08:58:54] !log upload fifo-log-demux 0.1 to stretch-wikimedia T204225 [08:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:57] T204225: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 [08:59:07] or probably rather load-related and independent of puppet as it mostly affects busy systems [08:59:36] RECOVERY - puppet last run on elastic1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:59:46] 1035 and 1049 cleared up [09:00:00] RECOVERY - Check systemd state on elastic1035 is OK: OK - running: The system is fully operational [09:00:30] moritzm: They are back up now. [09:00:36] Thanks! [09:01:14] I see about the bug in Diamond [09:02:00] RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:03:51] !log repooling db1088 (T85757) [09:03:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:54] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:04:10] (03PS1) 10Banyek: Revert "mariadb: depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473699 [09:05:55] (03CR) 10Banyek: [C: 032] Revert "mariadb: depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473699 (owner: 10Banyek) [09:08:34] !log banyek@deploy1001 Synchronized wmf-config/db-eqiad.php: T85757: repool db1088 (duration: 00m 53s) [09:08:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:38] !log reset failed debmonitor session in ms-be2038 [09:08:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:56] RECOVERY - Check systemd state on ms-be2038 is OK: OK - running: The system is fully operational [09:11:06] 10Operations: The icinga web interface can't read the icinga log file - https://phabricator.wikimedia.org/T209568 (10Joe) [09:11:27] (03PS1) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473700 [09:12:03] (03PS2) 10Ema: Add module to install and configure fifo-log-demux [puppet] - 10https://gerrit.wikimedia.org/r/473554 (https://phabricator.wikimedia.org/T204225) [09:12:24] (03PS2) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473700 [09:13:04] 10Operations, 10Maps: Make nodes.bin cache file writable by osmupdater after it is created by osmimporter - https://phabricator.wikimedia.org/T209569 (10Mathew.onipe) p:05Triage>03Normal [09:13:51] 10Operations: issue pulling 1 layer of docker-registry.wikimedia.org/releng/composer-php71:latest - https://phabricator.wikimedia.org/T209507 (10fselles) a:03fselles [09:13:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473700 (owner: 10Marostegui) [09:14:10] 10Operations, 10Maps, 10Discovery-Search (Current work): Disable proxy for beta cluster in maps - https://phabricator.wikimedia.org/T209570 (10Mathew.onipe) [09:14:34] (03CR) 10jenkins-bot: Revert "mariadb: depool db1088" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473699 (owner: 10Banyek) [09:15:15] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473700 (owner: 10Marostegui) [09:15:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473700 (owner: 10Marostegui) [09:16:08] !log Stop MySQL on db1110 for upgrade [09:16:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:16] (03CR) 10Ema: [C: 032] Add module to install and configure fifo-log-demux [puppet] - 10https://gerrit.wikimedia.org/r/473554 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [09:16:28] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 53s) [09:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:34] (03PS2) 10Ema: trafficserver: configure fifo-log-demux [puppet] - 10https://gerrit.wikimedia.org/r/473555 (https://phabricator.wikimedia.org/T204225) [09:21:55] (03CR) 10Addshore: [C: 04-1] Set Wikibase string-limits (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [09:22:21] !log draining ganeti1003 for reboot/kernel security update [09:22:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:30] (03CR) 10Ema: [C: 032] trafficserver: configure fifo-log-demux [puppet] - 10https://gerrit.wikimedia.org/r/473555 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [09:25:34] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473702 [09:25:54] 10Operations: issue pulling 1 layer of docker-registry.wikimedia.org/releng/composer-php71:latest - https://phabricator.wikimedia.org/T209507 (10fselles) According to the registry view everything went fine. ` Nov 14 16:40:03 darmstadtium docker-registry[15598]: time="2018-11-14T16:40:03Z" level=info msg="respo... [09:28:47] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473702 (owner: 10Marostegui) [09:30:19] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473702 (owner: 10Marostegui) [09:31:49] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1110 (duration: 00m 52s) [09:31:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:46] !log restarting icinga on icinga1001 [09:32:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:05] (03PS1) 10Filippo Giunchedi: swift: add statsd mappings for periodic metrics [puppet] - 10https://gerrit.wikimedia.org/r/473704 (https://phabricator.wikimedia.org/T205870) [09:40:37] !log restarting icinga on icinga1001 [09:40:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:52] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473702 (owner: 10Marostegui) [09:43:18] (03CR) 10DCausse: "lgtm! just a suggestion for the TODO message" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe) [09:45:46] PROBLEM - DPKG on ms-be2021 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:46:12] that's me ^ [09:47:46] RECOVERY - DPKG on ms-be2021 is OK: All packages OK [09:48:30] (03PS1) 10Ema: ATS: add atslog [puppet] - 10https://gerrit.wikimedia.org/r/473705 (https://phabricator.wikimedia.org/T204225) [09:48:34] (03CR) 10Filippo Giunchedi: [C: 032] swift: add statsd mappings for periodic metrics [puppet] - 10https://gerrit.wikimedia.org/r/473704 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [09:48:42] (03PS2) 10Filippo Giunchedi: swift: add statsd mappings for periodic metrics [puppet] - 10https://gerrit.wikimedia.org/r/473704 (https://phabricator.wikimedia.org/T205870) [09:51:45] (03PS2) 10Ema: ATS: add atslog [puppet] - 10https://gerrit.wikimedia.org/r/473705 (https://phabricator.wikimedia.org/T204225) [09:53:12] 10Operations, 10monitoring, 10Patch-For-Review: Provision >= 50% of statsd/Graphite-only metrics in Prometheus - https://phabricator.wikimedia.org/T205870 (10fgiunchedi) [09:55:00] (03CR) 10Ema: [C: 032] ATS: add atslog [puppet] - 10https://gerrit.wikimedia.org/r/473705 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [09:55:49] 10Operations, 10User-Joe: Gather metrics from php-fpm - https://phabricator.wikimedia.org/T209573 (10Joe) p:05Triage>03High [09:57:34] !log draining ganeti1002 for reboot/kernel security update [09:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:32] (03PS1) 10Vgutierrez: certcentral: split base path in config and certificates path [software/certcentral] - 10https://gerrit.wikimedia.org/r/473706 (https://phabricator.wikimedia.org/T209475) [10:03:27] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/473213 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:03:41] (03CR) 10DCausse: elasticsearch: cookbook for multi-cluster services rolling restart (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/467964 (https://phabricator.wikimedia.org/T207919) (owner: 10Mathew.onipe) [10:05:22] 10Operations, 10Analytics, 10Wikimedia-Logstash, 10Core Platform Team Backlog (Watching / External), and 2 others: Review and make librdkafka-0.11.6 installable from stretch-wikimedia - https://phabricator.wikimedia.org/T209300 (10mobrovac) Mental note for @Ottomata @Pchelolo and myself: the current node-r... [10:07:35] (03CR) 10DCausse: [C: 031] remote: refactor Remote.query() API [software/spicerack] - 10https://gerrit.wikimedia.org/r/473213 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:07:37] (03CR) 10Gehel: [C: 031] "Very minor comment inline, feel free to merge as-is or correct and merge without further review." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:07:42] !log sanitizing db2094 ( T205714 T207584 T205713 T206916 ) [10:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:48] T205714: Prepare and check storage layer for yuewiktionary - https://phabricator.wikimedia.org/T205714 [10:07:49] T206916: Prepare and check storage layer for shnwiki - https://phabricator.wikimedia.org/T206916 [10:07:49] T207584: Prepare and check storage layer for punjabiwikimedia - https://phabricator.wikimedia.org/T207584 [10:07:49] T205713: Prepare and check storage layer for liwikinews - https://phabricator.wikimedia.org/T205713 [10:08:04] 10Operations, 10Traffic: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [10:08:07] 10Operations, 10Traffic, 10Patch-For-Review: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 (10ema) 05Open>03Resolved [10:08:22] 10Operations, 10Traffic: ATS production-ready as a backend cache layer - https://phabricator.wikimedia.org/T207048 (10ema) [10:10:00] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:11:08] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [10:12:14] (03PS1) 10Mathew.onipe: prometheus-blazegraph-exporter: added runningQueriesCount metric [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/473707 (https://phabricator.wikimedia.org/T206123) [10:15:31] !log sanitizing db1124 ( T205714 T207584 T205713 T206916 ) [10:15:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:38] T205714: Prepare and check storage layer for yuewiktionary - https://phabricator.wikimedia.org/T205714 [10:15:38] T206916: Prepare and check storage layer for shnwiki - https://phabricator.wikimedia.org/T206916 [10:15:38] T207584: Prepare and check storage layer for punjabiwikimedia - https://phabricator.wikimedia.org/T207584 [10:15:39] T205713: Prepare and check storage layer for liwikinews - https://phabricator.wikimedia.org/T205713 [10:15:57] (03CR) 10DCausse: Add Icinga module (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:16:40] (03PS1) 10Filippo Giunchedi: WIP: hieradata switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) [10:16:42] (03CR) 10Hashar: "recheck" [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/473432 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [10:17:21] (03CR) 10jerkins-bot: [V: 04-1] WIP: hieradata switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [10:20:46] (03PS2) 10Filippo Giunchedi: WIP: hieradata switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) [10:22:23] 10Operations, 10ops-codfw, 10Core Platform Team (Session Management Service (CDP2)), 10Core Platform Team Backlog (Watching / External), 10Services (watching): rack/setup/install sessionstore200[123].codfw.wmnet - https://phabricator.wikimedia.org/T209389 (10mobrovac) [10:25:43] (03CR) 10Volans: [C: 032] remote: refactor Remote.query() API [software/spicerack] - 10https://gerrit.wikimedia.org/r/473213 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:27:50] (03Merged) 10jenkins-bot: remote: refactor Remote.query() API [software/spicerack] - 10https://gerrit.wikimedia.org/r/473213 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:28:16] (03CR) 10Filippo Giunchedi: "PCC's happy https://puppet-compiler.wmflabs.org/compiler1001/13510/" [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [10:28:27] (03PS5) 10Volans: Add Icinga module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) [10:28:29] (03PS3) 10Filippo Giunchedi: hieradata: switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) [10:29:31] (03CR) 10Volans: "replies inline" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:30:39] (03PS1) 10Vgutierrez: debian: Take into account /var/lib/certcentral [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473713 (https://phabricator.wikimedia.org/T209475) [10:34:28] !log set migration_downtime=2000 for puppetdb1001. Should help with migration stalls [10:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:02] (03PS4) 10Filippo Giunchedi: hieradata: switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) [10:35:08] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: switch all swift statsd traffic to statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/473709 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [10:47:55] akosiaris: what migrations? re: puppetdb1001. [10:48:15] ganeti reboots [10:48:23] ahhhh, got it, thanks ;) [10:48:25] memory transfer between the nodes [10:48:34] which stalls forever with it [10:48:39] I 'll need to reboot it btw [10:48:46] in order for that setting to apply [10:48:50] I thought it was a config in puppetdb and I didn't understand what it was :D [10:48:52] actually lemme do it now [10:49:54] !log fail over ganeti master in eqiad to ganeti1003 [10:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:59] !log disable puppet across the fleet for puppetdb1001 reboot [10:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:31] (03CR) 10Ema: [V: 032 C: 032] "recheck" [software/fifo-log-demux] - 10https://gerrit.wikimedia.org/r/473432 (https://phabricator.wikimedia.org/T204225) (owner: 10Ema) [10:57:20] 10Operations: issue pulling 1 layer of docker-registry.wikimedia.org/releng/composer-php71:latest - https://phabricator.wikimedia.org/T209507 (10fselles) that layer looks ok also on storage, either is a caching issue or an nginx issue breaking connections from client, I'll continue to debug it [11:02:04] 10Operations: issue pulling 1 layer of docker-registry.wikimedia.org/releng/composer-php71:latest - https://phabricator.wikimedia.org/T209507 (10fselles) please @Addshore report if happens again with container image and date/timestamp of the pulling. [11:03:02] PROBLEM - Check systemd state on labsdb1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:03:54] (03PS3) 10Tarrow: Set Wikibase string-limits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) [11:04:34] !log enable puppet across the fleet. puppetdb1001 reboot done, ganeti migration_downtime setting applied [11:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:09] (03PS1) 10Filippo Giunchedi: logstash: remove obsolete daily restart cron [puppet] - 10https://gerrit.wikimedia.org/r/473714 [11:10:41] (03PS6) 10Volans: Add Icinga module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) [11:12:58] (03CR) 10Muehlenhoff: [C: 031] "Looks good, maybe also add a generic "redis" which covers A:eqiad and A:codfw." [puppet] - 10https://gerrit.wikimedia.org/r/473582 (owner: 10Effie Mouzeli) [11:19:33] (03PS2) 10Giuseppe Lavagetto: mediawiki: allow serving content from php7 on canaries [puppet] - 10https://gerrit.wikimedia.org/r/471231 (https://phabricator.wikimedia.org/T206338) [11:23:49] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: allow serving content from php7 on canaries [puppet] - 10https://gerrit.wikimedia.org/r/471231 (https://phabricator.wikimedia.org/T206338) (owner: 10Giuseppe Lavagetto) [11:24:26] (03PS4) 10Tarrow: Set Wikibase string-limits for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) [11:24:28] (03PS1) 10Tarrow: Read WikibaseStringLimit in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) [11:26:15] (03CR) 10Tarrow: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [11:26:41] (03CR) 10Tarrow: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [11:33:57] (03CR) 10Marostegui: [C: 04-1] "Let's reduce the scope for this first and see what issues we find. Let's only work on the non active ones first" [puppet] - 10https://gerrit.wikimedia.org/r/473546 (https://phabricator.wikimedia.org/T202367) (owner: 10Banyek) [11:34:18] <_joe_> !log depooling mw1261 for early benchmarks of php7.2-fpm [11:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:13] (03CR) 10Filippo Giunchedi: [C: 032] logstash: remove obsolete daily restart cron [puppet] - 10https://gerrit.wikimedia.org/r/473714 (owner: 10Filippo Giunchedi) [11:36:20] (03PS2) 10Filippo Giunchedi: logstash: remove obsolete daily restart cron [puppet] - 10https://gerrit.wikimedia.org/r/473714 [11:43:07] o/ raynor is it okay if I deploy your patch in the swat windows (i'll be in a call with some others showing them the ropes) :) [11:44:14] yes, sure [11:44:23] amazing :) [11:44:44] it's the 3rd patch, yesterday we did 1%, then 5%, now we're bumping to 25% [11:45:22] on my side it's just a smoke test that everything works properly, the json+ld schema is still present on bucketed articles [11:45:44] (03CR) 10Addshore: [C: 031] "https://tools.wmflabs.org/add/api/gerrit.wikimedia/deployedSites/https%3A%2F%2Fgerrit.wikimedia.org%2Fr%2F%23%2Fc%2Fmediawiki%2Fextensions" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471738 (https://phabricator.wikimedia.org/T207854) (owner: 10Lucas Werkmeister (WMDE)) [11:46:14] (03CR) 10Addshore: [C: 031] Prod: increase Schema.org page split test to 25% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473224 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [11:46:28] (03CR) 10Addshore: [C: 031] Set Wikibase string-limits for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [11:47:13] (03CR) 10Addshore: [C: 031] Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473440 (https://phabricator.wikimedia.org/T207640) (owner: 10WMDE-Fisch) [11:50:17] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) [11:50:46] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) [11:54:36] jouncebot: Next [11:54:36] In 0 hour(s) and 5 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1200) [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1200). [12:00:04] CFisch_WMDE, tarrow, Lucas_WMDE, and raynor: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:08] \o I will be running EU swat today with tarrow and Lucas_WMDE :) [12:00:10] o/ [12:00:20] o/ [12:00:33] CFisch_WMDE: around? :) [12:00:44] I'm around, in case I'm needed [12:00:56] o/ [12:00:56] zeljkof: thanks! [12:01:00] but it's nice to see the swat taking care of itself ;) [12:01:19] my job is done here, all deployers can and know how to deploy [12:01:25] * zeljkof rides in the sunset [12:01:29] haha [12:01:34] 10Operations, 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10aborrero) I will try to discuss this in our next team meeting. [12:01:39] (03PS2) 10Addshore: Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473440 (https://phabricator.wikimedia.org/T207640) (owner: 10WMDE-Fisch) [12:01:44] (03CR) 10Addshore: [C: 032] Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473440 (https://phabricator.wikimedia.org/T207640) (owner: 10WMDE-Fisch) [12:02:40] addshore: [12:02:44] \o/ [12:02:44] \o [12:03:14] (03Merged) 10jenkins-bot: Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473440 (https://phabricator.wikimedia.org/T207640) (owner: 10WMDE-Fisch) [12:03:16] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:04:55] (03CR) 10jenkins-bot: Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473440 (https://phabricator.wikimedia.org/T207640) (owner: 10WMDE-Fisch) [12:05:05] CFisch_WMDE: it is on mwdebug1002 [12:05:09] please check :) [12:05:50] 10Operations, 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10aborrero) Noob question: I understand that cloudinfra-puppetmaster-01 is a puppetmaster just for the cloudinfra project, right? Are we... [12:06:16] addshore: nice [12:06:21] works [12:08:35] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:473440|Make AdvancedSearch the default on de-, fa-, ar-, and hu-wiki]] T207640 (duration: 00m 55s) [12:08:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:38] T207640: Make AdvancedSearch default on early adopter wikis - https://phabricator.wikimedia.org/T207640 [12:08:54] CFisch_WMDE: done! [12:09:19] Nice [12:09:21] Works [12:09:22] next [12:09:51] CFisch_WMDE: the other 2 changes are still merging so we will come back to them [12:09:52] backports aren’t merged yet, so we’ll go ahead with the config changes while the backports make their way through CI [12:11:04] (03PS5) 10Tarrow: Set Wikibase string-limits for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) [12:11:42] (03CR) 10Tarrow: [C: 032] "swat" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:11:57] (03PS2) 10Tarrow: Read WikibaseStringLimit in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) [12:13:11] (03Merged) 10jenkins-bot: Set Wikibase string-limits for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:21:05] (03CR) 10Tarrow: [C: 032] Read WikibaseStringLimit in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:21:20] (03CR) 10jenkins-bot: Set Wikibase string-limits for wikidata dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473694 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:21:28] !log tarrow@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:473694]] Set Wikibase string-limits for wikidata dblist T154660 (duration: 00m 54s) [12:21:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:32] T154660: Increase length limit for external identifier, string and URL datatype - https://phabricator.wikimedia.org/T154660 [12:22:24] (03Merged) 10jenkins-bot: Read WikibaseStringLimit in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:22:39] (03CR) 10jenkins-bot: Read WikibaseStringLimit in Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [12:26:34] !log tarrow@deploy1001 sync-file aborted: [[gerrit:473716]] Read WikibaseStringLimit in Wikibase.php (duration: 00m 01s) [12:26:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:04] err.. that was good timing [12:27:31] tarrow: your muted tom? [12:30:20] !log draining ganeti1001 for reboot/kernel security update [12:30:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:45] !log tarrow@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:473716]] Read WikibaseStringLimit in Wikibase.php T154660 (duration: 00m 53s) [12:30:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:49] T154660: Increase length limit for external identifier, string and URL datatype - https://phabricator.wikimedia.org/T154660 [12:31:07] CFisch_WMDE: we’re going ahead with the backports now, I’ll ping you once they’re ready to test [12:32:28] Lucas_WMDE: kewl [12:34:11] CFisch_WMDE: your change is on mwdebug1002 for wmf.3, please test [12:34:27] yep [12:35:05] addshore, whatETA on the schema.org stuff? [12:35:09] looks good thanks Lucas_WMDE [12:35:12] can go live [12:35:13] what's the eta* [12:35:14] alright, going ahead [12:35:27] raynor: I can do it in around 5-10 mins I think, we can just get this other backport out first :) [12:35:30] I can wait, it's ok, I'm just asking when can I expect that thing [12:35:37] sure, take your time, I' [12:35:51] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) I think that by rebooting `labmon1001.eqiad.wmnet` we will have just a brief gap in metrics/graphs, which is not a big deal. I will reboot it now. [12:36:24] I'm around (damn, I didn't use my mechanical keyboard for like a month and I'm randomly spamming wrong buttons, and usually that button is enter ;/) [12:37:06] !log lucaswerkmeister-wmde@deploy1001 Started scap: php-1.33.0-wmf.3/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js [[gerrit:473710|Fix (accidentally?) reversed blue and yellow lines (T162119, T208238)]] [12:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:11] T162119: Add keyboard shortcuts to move between revisions - https://phabricator.wikimedia.org/T162119 [12:37:11] T208238: [bug] blue and yellow line are reversed - https://phabricator.wikimedia.org/T208238 [12:37:17] !log lucaswerkmeister-wmde@deploy1001 sync aborted: php-1.33.0-wmf.3/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js [[gerrit:473710|Fix (accidentally?) reversed blue and yellow lines (T162119, T208238)]] (duration: 00m 11s) [12:37:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:04] :D [12:38:09] sorry for the duplicate log message, I accidentally did scap sync instead of sync-file [12:38:22] should be good now though [12:38:33] !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.33.0-wmf.3/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: [[gerrit:473710|Fix (accidentally?) reversed blue and yellow lines (T162119, T208238)]] (duration: 00m 54s) [12:38:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:59] raynor: gonna do yours now [12:39:09] (03PS4) 10Addshore: Prod: increase Schema.org page split test to 25% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473224 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [12:39:11] \o/ [12:39:13] (03CR) 10Addshore: [C: 032] Prod: increase Schema.org page split test to 25% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473224 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [12:40:10] raynor: will you want to test on a debug server or not? [12:40:19] I think we can safely skip it [12:40:25] ack :) [12:40:32] (03Merged) 10jenkins-bot: Prod: increase Schema.org page split test to 25% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473224 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [12:40:44] !log T207377 downtime and reboot labmon1001 [12:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:47] T207377: Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 [12:41:17] raynor: syncing [12:42:04] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) [12:42:07] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:473224|Prod: increase Schema.org page split test to 25% sampling]] T208755 (duration: 00m 53s) [12:42:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:42:10] T208755: Launch A/B test for sameAs property - https://phabricator.wikimedia.org/T208755 [12:42:19] raynor: all done [12:42:34] (03PS2) 10Lucas Werkmeister (WMDE): Remove wgWBQualityConstraintsCacheCheckConstraintsResults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471738 (https://phabricator.wikimedia.org/T207854) [12:42:36] ok, let me check [12:42:40] CFisch_WMDE: tarrow will do your .4 change in a short while once he has recovered from sending his laptop to sleep [12:42:50] (03CR) 10Lucas Werkmeister (WMDE): [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471738 (https://phabricator.wikimedia.org/T207854) (owner: 10Lucas Werkmeister (WMDE)) [12:43:03] addshore: I don't know if I can test it but good ^^ [12:43:03] in the meantime I’m deploying another config change [12:43:16] f2k1de: coolio :) we can just sync it straight out then [12:43:20] addshore: should be fine though [12:43:34] oop, meant to ping CFisch_WMDE there.... [12:44:04] (03Merged) 10jenkins-bot: Remove wgWBQualityConstraintsCacheCheckConstraintsResults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471738 (https://phabricator.wikimedia.org/T207854) (owner: 10Lucas Werkmeister (WMDE)) [12:45:01] ok, looks like it works properly. Thank you addshore [12:45:20] I'll keep testing but I think it's safe to say "it's alive...." [12:45:44] raynor: amazing! :) [12:45:45] (03PS1) 10Alexandros Kosiaris: Introduce zoterov2 LVS IPs [dns] - 10https://gerrit.wikimedia.org/r/473727 (https://phabricator.wikimedia.org/T201611) [12:46:35] (03PS1) 10Alexandros Kosiaris: Remove LVS IP assignments for ocg [dns] - 10https://gerrit.wikimedia.org/r/473728 [12:47:30] !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:471738|Remove wgWBQualityConstraintsCacheCheckConstraintsResults (T207854)]] (duration: 00m 54s) [12:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:33] T207854: Set QualityConstraints global WBQualityConstraintsCacheCheckConstraintsResults default true - https://phabricator.wikimedia.org/T207854 [12:48:30] CFisch_WMDE: working on your .4 one now [12:48:34] fyi [12:48:48] +1 [12:49:20] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [12:51:43] (03CR) 10jenkins-bot: Prod: increase Schema.org page split test to 25% sampling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473224 (https://phabricator.wikimedia.org/T208755) (owner: 10Niedzielski) [12:51:45] (03CR) 10jenkins-bot: Remove wgWBQualityConstraintsCacheCheckConstraintsResults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471738 (https://phabricator.wikimedia.org/T207854) (owner: 10Lucas Werkmeister (WMDE)) [12:52:10] !log mobrovac@deploy1001 Started deploy [restbase/deploy@22cb0ec]: Add new wikis to RESTBase - T206777 T205710 T205546 T204477 [12:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:18] T205546: Create Wiktionary Cantonese - https://phabricator.wikimedia.org/T205546 [12:52:19] T204477: Create punjabi.wikimedia.org for Punjabi Wikimedians User Group - https://phabricator.wikimedia.org/T204477 [12:52:19] T205710: Create Wikinews Limburgish - https://phabricator.wikimedia.org/T205710 [12:52:20] T206777: Create Wikipedia Shan - https://phabricator.wikimedia.org/T206777 [12:52:48] jouncebot: next [12:52:48] In 0 hour(s) and 7 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1300) [12:52:56] [= [12:53:00] on the last swat change! [12:53:59] !log installing nginx security updates [12:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:37] !log tarrow@deploy1001 scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [12:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:46] interesting [12:57:37] <_joe_> !log upping pm.maxworkers to 40 on mw1261 on php7.2-fpm, benchmarking T206341 [12:57:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:40] T206341: Evaluate scalability and performance of PHP7 compared to HHVM - https://phabricator.wikimedia.org/T206341 [12:59:48] !log tarrow@deploy1001 Synchronized php-1.33.0-wmf.4/extensions/RevisionSlider/modules/ext.RevisionSlider.SliderView.js: [[gerrit:473710]] Fix (accidentally?) reversed blue and yellow lines SWAT T208238 T162119 again (duration: 00m 55s) [12:59:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:51] T162119: Add keyboard shortcuts to move between revisions - https://phabricator.wikimedia.org/T162119 [12:59:52] T208238: [bug] blue and yellow line are reversed - https://phabricator.wikimedia.org/T208238 [13:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1300) [13:00:05] CFisch_WMDE: ^^ [13:00:08] !log EU SWAT finished [13:00:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:11] jouncebot: now [13:00:11] For the next 0 hour(s) and 59 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1300) [13:01:39] RECOVERY - mysqld processes on pc2009 is OK: PROCS OK: 1 process with command name mysqld [13:01:41] (03PS1) 10Filippo Giunchedi: hieradata: send periodic swift stats to localhost [puppet] - 10https://gerrit.wikimedia.org/r/473729 (https://phabricator.wikimedia.org/T205870) [13:02:23] recovery page with no outage page? hrm [13:03:46] 10Operations, 10Performance-Team: Evaluate scalability and performance of PHP7 compared to HHVM - https://phabricator.wikimedia.org/T206341 (10Joe) >>! In T206341#4704564, @Imarlier wrote: > Is there anything specific being asked of the Performance Team, or is this something that @Joe (or others) were planning... [13:03:52] RECOVERY - MariaDB Slave SQL: pc3 on pc2009 is OK: OK slave_sql_state Slave_SQL_Running: Yes [13:04:24] RECOVERY - MariaDB Slave IO: pc3 on pc2009 is OK: OK slave_io_state Slave_IO_Running: Yes [13:04:37] I am wondering why is the train so late in the afternoon bah [13:05:59] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) Scheduled reboot for cloudcontrol1003, cloudservices1003, labcontrol1001 and labservices1001 for next **Monday 2018-11-19 at 13:00 UTC**. Email sent to cloud-announce. [13:07:48] apergos: the outage page was from days ago I reckon [13:08:32] guess so [13:08:34] we just had a large burst of mediawiki errors at 13:06 . [13:08:47] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: send periodic swift stats to localhost [puppet] - 10https://gerrit.wikimedia.org/r/473729 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [13:09:23] stuff like ErrorException from line 1308 of /srv/mediawiki/php-1.33.0-wmf.3/includes/Message.php: PHP Notice: Undefined variable: message [13:09:30] or ErrorException from line 1303 of /srv/mediawiki/php-1.33.0-wmf.3/includes/Message.php: PHP Warning: Invalid argument supplied for foreach() [13:11:43] (03PS1) 10Mathew.onipe: maps: added use_proxy flag to set proxy [puppet] - 10https://gerrit.wikimedia.org/r/473731 (https://phabricator.wikimedia.org/T209570) [13:12:06] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@22cb0ec]: Add new wikis to RESTBase - T206777 T205710 T205546 T204477 (duration: 19m 56s) [13:12:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:12:14] T205546: Create Wiktionary Cantonese - https://phabricator.wikimedia.org/T205546 [13:12:15] T204477: Create punjabi.wikimedia.org for Punjabi Wikimedians User Group - https://phabricator.wikimedia.org/T204477 [13:12:17] T205710: Create Wikinews Limburgish - https://phabricator.wikimedia.org/T205710 [13:12:17] T206777: Create Wikipedia Shan - https://phabricator.wikimedia.org/T206777 [13:13:32] <_joe_> hashar: from which server? [13:13:42] <_joe_> hashar: if it's mw1261, then it's my fault [13:13:52] (03PS1) 10Marostegui: site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) [13:13:54] <_joe_> I'm benchmarking php 7.2 vs HHVM [13:14:35] (03PS2) 10Marostegui: site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) [13:15:21] (03PS3) 10Marostegui: site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) [13:16:42] _joe_: all servers. Seems a memcached value got corrupted somehow [13:16:48] <_joe_> actually no, it doesn't have anything to do with my issue. It looks like a corrupted serialized object made it to the caches [13:16:51] [{exception_id}] {exception_url} ErrorException from line 302 of /srv/mediawiki/php-1.33.0-wmf.3/includes/Message.php: PHP Notice: Unable to unserialize: [a:8:{s:9:"interface";b:1;s:8:"language";s:2:"en";s:3:"key";s:30:"wikimediashoplink-link-tooltip";s [13:17:04] <_joe_> hashar: eheh indeed [13:17:20] <_joe_> can you see what's wrong there in more detail? [13:17:50] https://logstash.wikimedia.org/goto/5d4bdd746036a01e3c2dc9d85558f963 [13:18:13] PHP Notice: Unable to unserialize: [a:8:{s:9:"interface";b:1;s:8:"language";s:2:"en";s:3:"key";s:30:"wikimediashoplink-link-tooltip";s:9:"keysToTry";a:1:{i:0;s:30:"wikimediashoplink-link-tooltip";}s:10:"parameters";a:0:{}s:6:"format";s:5:"parse";s:11:"useDatabase";b:1;s:5:"title";r:46;}]. Id 46 out of range. [13:18:21] (03PS1) 10Alexandros Kosiaris: Introduce zoterov2 LVS endpoint [puppet] - 10https://gerrit.wikimedia.org/r/473733 (https://phabricator.wikimedia.org/T201611) [13:18:42] (03PS4) 10Marostegui: site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) [13:19:32] who knows what a r:46 stands for [13:20:28] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler1002/13514/" [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [13:21:52] (03PS1) 10Vgutierrez: lvs: Replace lvs2006 with lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/473734 [13:22:02] (03CR) 10Mathew.onipe: "PCC is happy: https://puppet-compiler.wmflabs.org/compiler1001/13515/" [puppet] - 10https://gerrit.wikimedia.org/r/473731 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [13:22:08] (03CR) 10Vgutierrez: [C: 04-2] "router configuration still missing" [puppet] - 10https://gerrit.wikimedia.org/r/473734 (owner: 10Vgutierrez) [13:22:46] (03CR) 10jerkins-bot: [V: 04-1] lvs: Replace lvs2006 with lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/473734 (owner: 10Vgutierrez) [13:23:39] (03PS2) 10Vgutierrez: lvs: Replace lvs2006 with lvs2010 [puppet] - 10https://gerrit.wikimedia.org/r/473734 (https://phabricator.wikimedia.org/T209337) [13:24:01] hashar: evil corruption? [13:25:29] hashar: but just 1 spike, hmmm [13:25:33] I filled as https://phabricator.wikimedia.org/T209582 [13:25:34] (03CR) 10Gehel: [C: 04-1] maps: added use_proxy flag to set proxy (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/473731 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [13:25:54] what is suspicious to me is the [r:46;] Id 46 out of range [13:26:23] I cant see what the r stands for [13:27:01] possibly resource? [13:29:04] (03PS7) 10Volans: Add Icinga module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) [13:29:04] R is the index of the created sub-variable ? [13:29:06] (03PS1) 10Volans: Add Puppet module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) [13:30:00] https://secure.php.net/manual/en/function.unserialize.php#73585 :P [13:30:41] (03PS1) 10Mathew.onipe: maps: update SQL script location for kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/473736 (https://phabricator.wikimedia.org/T209566) [13:30:52] (03CR) 10Banyek: [C: 031] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [13:31:09] (03PS5) 10Marostegui: site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) [13:31:55] (03CR) 10Marostegui: [C: 032] site.pp: Adjust regex for parsercache [puppet] - 10https://gerrit.wikimedia.org/r/473732 (https://phabricator.wikimedia.org/T208383) (owner: 10Marostegui) [13:32:01] (03CR) 10Volans: "First iteration to unblock the ES patch, additional features will be added soon in separate CRs" [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:34:56] RECOVERY - puppet last run on pc2009 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:43:32] RECOVERY - Check systemd state on labsdb1007 is OK: OK - running: The system is fully operational [13:44:46] Lucas_WMDE: seems like a variable reference :((( [13:46:31] well really no [13:46:34] I have no idea [13:48:42] !log installing qemu security updates (which also backport support for SSBD passthrough) [13:48:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:51] !log installing qemu security updates (which also backport support for SSBD passthrough) on ganeti clusters [13:48:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:45] !log Deploy schema change on s6 codfw master, this will generate lag on s6 codfw - T85757 [13:50:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:49] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [13:53:09] going to run the train [13:54:12] (03PS1) 10Hashar: all wikis to 1.33.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473738 [13:54:14] (03CR) 10Hashar: [C: 032] all wikis to 1.33.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473738 (owner: 10Hashar) [13:54:28] (03CR) 10Gehel: [C: 04-1] "minor comments inline" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [13:55:24] !log push firewall policies to pfw3-eqiad - T209421 [13:55:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:33] (03Merged) 10jenkins-bot: all wikis to 1.33.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473738 (owner: 10Hashar) [13:59:38] !log hashar@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.4 [13:59:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:04] hashar: How many deployers does it take to do MediaWiki train - European version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1400). [14:00:48] banyek: db2046 replication broken [14:01:24] (03PS2) 10Mathew.onipe: maps: added use_proxy flag to set proxy [puppet] - 10https://gerrit.wikimedia.org/r/473731 (https://phabricator.wikimedia.org/T209570) [14:01:26] (03PS2) 10Mathew.onipe: maps: update SQL script location for kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/473736 (https://phabricator.wikimedia.org/T209566) [14:01:39] marostegui: I'll fix it [14:01:45] I know what happened [14:01:46] PROBLEM - MariaDB Slave SQL: s6 on db2046 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1091, Errmsg: Error Cant DROP user_options: check that column/key exists on query. Default database: frwiki. [Query snipped] [14:01:47] I am in a meeting [14:02:20] (03CR) 10Mathew.onipe: maps: added use_proxy flag to set proxy (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/473731 (https://phabricator.wikimedia.org/T209570) (owner: 10Mathew.onipe) [14:02:23] np [14:05:01] ACKNOWLEDGEMENT - MariaDB Slave SQL: s6 on db2046 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1091, Errmsg: Error Cant DROP user_options: check that column/key exists on query. Default database: frwiki. [Query snipped] Banyek T85757#4750096 [14:05:04] PROBLEM - MariaDB Slave SQL: s6 on db2095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Could not execute Update_rows_v1 event on table frwiki.user: Unknown column user_options in NEW, Error_code: 1054: handler error HA_ERR_GENERIC: the events master log db2076-bin.001204, end_log_pos 1003886861 [14:05:47] (03CR) 10jenkins-bot: all wikis to 1.33.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473738 (owner: 10Hashar) [14:06:01] (03CR) 10Gehel: [C: 04-1] Add Puppet module (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [14:07:25] !log plugin and JVM upgrade on elasticsearch / cirrus / eqiad completed - T209293 [14:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:28] T209293: Prepare a deb package with the experimental highlighter 5.5.2.4 - https://phabricator.wikimedia.org/T209293 [14:10:02] PROBLEM - MariaDB Slave Lag: s6 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 682.33 seconds [14:10:48] PROBLEM - MariaDB Slave Lag: s6 on db2046 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 729.94 seconds [14:11:03] 1.33.0-wmf.4 seems good to me [14:11:10] I am going to commute back home [14:12:00] RECOVERY - MariaDB Slave SQL: s6 on db2046 is OK: OK slave_sql_state Slave_SQL_Running: Yes [14:14:11] ACKNOWLEDGEMENT - MariaDB Slave Lag: s6 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 885.14 seconds Banyek T85757#4750096 [14:14:11] ACKNOWLEDGEMENT - MariaDB Slave SQL: s6 on db2095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Could not execute Update_rows_v1 event on table frwiki.user: Unknown column user_options in NEW, Error_code: 1054: handler error HA_ERR_GENERIC: the events master log db2076-bin.001204, end_log_pos 1003886861 Banyek T85757#4750096 [14:15:18] RECOVERY - MariaDB Slave Lag: s6 on db2046 is OK: OK slave_sql_lag Replication lag: 0.45 seconds [14:17:30] RECOVERY - MariaDB Slave SQL: s6 on db2095 is OK: OK slave_sql_state Slave_SQL_Running: Yes [14:19:17] (03CR) 10Sbisson: "If you make this 2 patches (one for betalabs and one for prod) we can merge the betalabs one right away and test it before we SWAT the oth" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [14:20:01] so the testing of dropping the `user_options` column from `user` table was finished, and when I started the alter in the codfw master with replication enabled it broke on db2046 (where the test was made) because there was no column. I added the column back (with sql_log_bin=0) and restarted the replication which dropped the column normally. [14:20:28] On the next run I have to use 'DROP COLUMN IF EXIST' instead of 'DROP COLUMN' [14:20:44] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 2 wikis in group 1 for T209373. This may cause lag in codfw. [14:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:47] T209373: Run maintenance/refreshExternallinksIndex.php on all wikis - https://phabricator.wikimedia.org/T209373 [14:20:52] PROBLEM - MariaDB Slave SQL: s6 on db2095 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1054, Errmsg: Could not execute Update_rows_v1 event on table jawiki.user: Unknown column user_options in NEW, Error_code: 1054: handler error HA_ERR_GENERIC: the events master log db2076-bin.001204, end_log_pos 1005807062 [14:21:19] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 3 wikis in group 1 for T209373. This may cause lag in codfw. [14:21:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:29] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 4 wikis in group 1 for T209373. This may cause lag in codfw. [14:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:36] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 5 wikis in group 1 for T209373. This may cause lag in codfw. [14:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:39] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 7 wikis in group 1 for T209373. This may cause lag in codfw. [14:21:41] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 8 wikis in group 1 for T209373. This may cause lag in codfw. [14:21:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:42] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on wikitech for T209373. This may cause lag in codfw. [14:21:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:00] RECOVERY - MariaDB Slave SQL: s6 on db2095 is OK: OK slave_sql_state Slave_SQL_Running: Yes [14:24:51] 10Operations, 10netops: asw2-a-eqiad FPC2 reboot - https://phabricator.wikimedia.org/T209588 (10ayounsi) p:05Triage>03Normal [14:25:14] (03PS4) 10Kosta Harlan: Beta labs: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) [14:25:17] (03CR) 10DCausse: "just a suggestion feel free to ignore" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [14:25:50] RECOVERY - MariaDB Slave Lag: s6 on db2095 is OK: OK slave_sql_lag Replication lag: 4.29 seconds [14:26:59] (03PS1) 10Kosta Harlan: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) [14:27:12] (03CR) 10Kosta Harlan: "> Patch Set 3:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [14:41:59] 10Operations, 10Performance-Team: Evaluate scalability and performance of PHP7 compared to HHVM - https://phabricator.wikimedia.org/T206341 (10Imarlier) >>! In T206341#4749928, @Joe wrote: >>>! In T206341#4704564, @Imarlier wrote: >> Is there anything specific being asked of the Performance Team, or is this so... [14:45:43] (03PS3) 10Andrew Bogott: Change static IPs to host names [puppet] - 10https://gerrit.wikimedia.org/r/472507 (https://phabricator.wikimedia.org/T208262) (owner: 1020after4) [14:47:39] (03CR) 1020after4: [C: 031] "this has been cherry picked for a while on beta and I haven't seen any problems, so I think it's safe to merge." [puppet] - 10https://gerrit.wikimedia.org/r/472507 (https://phabricator.wikimedia.org/T208262) (owner: 1020after4) [14:48:28] (03CR) 10Andrew Bogott: [C: 032] Change static IPs to host names [puppet] - 10https://gerrit.wikimedia.org/r/472507 (https://phabricator.wikimedia.org/T208262) (owner: 1020after4) [14:55:27] (03Abandoned) 10Herron: create rsyslog::ship_logfile - simplified logstash shipper via kafka [puppet] - 10https://gerrit.wikimedia.org/r/469945 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [14:58:31] (03CR) 10Alex Monk: [C: 032] certcentral: split base path in config and certificates path [software/certcentral] - 10https://gerrit.wikimedia.org/r/473706 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [14:59:06] (03PS1) 10Alex Monk: certcentral: split base path in config and certificates path [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473751 (https://phabricator.wikimedia.org/T209475) [15:00:29] (03Merged) 10jenkins-bot: certcentral: split base path in config and certificates path [software/certcentral] - 10https://gerrit.wikimedia.org/r/473706 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [15:02:47] (03CR) 10Alex Monk: [C: 031] "good to go when dependency is merged to this branch" [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473713 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [15:03:09] (03CR) 10jenkins-bot: certcentral: split base path in config and certificates path [software/certcentral] - 10https://gerrit.wikimedia.org/r/473706 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [15:06:15] (03CR) 10Vgutierrez: [C: 032] certcentral: split base path in config and certificates path [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473751 (https://phabricator.wikimedia.org/T209475) (owner: 10Alex Monk) [15:06:24] (03CR) 10Vgutierrez: [C: 032] debian: Take into account /var/lib/certcentral [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473713 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [15:06:32] (03PS1) 10Filippo Giunchedi: swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 [15:07:41] (03CR) 10jerkins-bot: [V: 04-1] swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 (owner: 10Filippo Giunchedi) [15:08:18] (03Merged) 10jenkins-bot: certcentral: split base path in config and certificates path [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473751 (https://phabricator.wikimedia.org/T209475) (owner: 10Alex Monk) [15:08:55] (03PS1) 10Vgutierrez: Release 0.7 [software/certcentral] - 10https://gerrit.wikimedia.org/r/473754 (https://phabricator.wikimedia.org/T208967) [15:10:32] (03CR) 10jenkins-bot: debian: Take into account /var/lib/certcentral [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473713 (https://phabricator.wikimedia.org/T209475) (owner: 10Vgutierrez) [15:11:02] (03CR) 10jenkins-bot: certcentral: split base path in config and certificates path [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473751 (https://phabricator.wikimedia.org/T209475) (owner: 10Alex Monk) [15:12:56] (03PS2) 10Filippo Giunchedi: swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 [15:13:24] w/in 29 [15:13:28] nope! [15:13:45] (03CR) 10Alex Monk: [C: 032] Release 0.7 [software/certcentral] - 10https://gerrit.wikimedia.org/r/473754 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [15:14:39] (03CR) 10jerkins-bot: [V: 04-1] swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 (owner: 10Filippo Giunchedi) [15:15:47] (03Merged) 10jenkins-bot: Release 0.7 [software/certcentral] - 10https://gerrit.wikimedia.org/r/473754 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [15:16:14] (03CR) 10Sbisson: [C: 032] Beta labs: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [15:16:43] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] "PCC https://puppet-compiler.wmflabs.org/compiler1002/13519/ms-fe1005.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/473753 (owner: 10Filippo Giunchedi) [15:16:44] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 605.10 seconds [15:16:58] (03PS3) 10Filippo Giunchedi: swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 [15:17:31] (03Merged) 10jenkins-bot: Beta labs: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [15:17:41] (03CR) 10jerkins-bot: [V: 04-1] swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 (owner: 10Filippo Giunchedi) [15:18:04] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] swift: lookup statsd host/port from hiera for stats_container [puppet] - 10https://gerrit.wikimedia.org/r/473753 (owner: 10Filippo Giunchedi) [15:18:46] (03CR) 10jenkins-bot: Release 0.7 [software/certcentral] - 10https://gerrit.wikimedia.org/r/473754 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [15:18:55] (03CR) 10jenkins-bot: Beta labs: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473653 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [15:19:42] (03PS2) 10Sbisson: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [15:22:34] (03PS1) 10Vgutierrez: acme_requests: Fix finalize_order() exception handling [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473757 (https://phabricator.wikimedia.org/T208967) [15:22:36] (03PS1) 10Vgutierrez: Release 0.7 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473758 (https://phabricator.wikimedia.org/T208967) [15:26:26] (03PS1) 10Vgutierrez: debian: Add release 0.7 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473759 (https://phabricator.wikimedia.org/T208967) [15:29:36] (03PS1) 10Muehlenhoff: Share keytabs directory for multiple services [puppet] - 10https://gerrit.wikimedia.org/r/473760 [15:32:28] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 269.82 seconds [15:35:56] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): tools-k8s-master-01 has two floating IPs - https://phabricator.wikimedia.org/T164123 (10Andrew) a:05Andrew>03None Worst case, this will get resolved when we move regions [15:36:04] 10Operations, 10Cloud-Services, 10cloud-services-team (Kanban): tools-k8s-master-01 has two floating IPs - https://phabricator.wikimedia.org/T164123 (10Andrew) p:05Normal>03Low [15:43:27] (03PS3) 10Gehel: elasticsearch: create multiple elasticsearch instances on cirrus codfw [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) [15:47:21] (03PS4) 10Gehel: elasticsearch: create multiple elasticsearch instances on cirrus codfw [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) [15:51:21] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 (10Papaul) ` papaul@asw-a-codfw> show interfaces xe-2/0/4 descriptions Interface Admin Link Description xe-2/0/4 up up ms-be2044 papaul... [15:51:30] (03CR) 10Gehel: "PCC looks happy: https://puppet-compiler.wmflabs.org/compiler1002/13523/" [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) (owner: 10Gehel) [15:52:37] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install new ms-be servers ms-be204[4-9] ,ms-be2050 - https://phabricator.wikimedia.org/T209395 (10Papaul) [15:55:23] (03PS1) 10Cwhite: admin: update ssh key for wmde-leszek [puppet] - 10https://gerrit.wikimedia.org/r/473765 (https://phabricator.wikimedia.org/T208717) [15:55:51] (03PS2) 10Bstorm: sonofgridengine: reworking bastion for stretch and docker [puppet] - 10https://gerrit.wikimedia.org/r/473647 (https://phabricator.wikimedia.org/T200557) [15:56:27] (03CR) 10Cwhite: [C: 032] admin: update ssh key for wmde-leszek [puppet] - 10https://gerrit.wikimedia.org/r/473765 (https://phabricator.wikimedia.org/T208717) (owner: 10Cwhite) [15:57:40] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment for WMDE-leszek - https://phabricator.wikimedia.org/T208717 (10colewhite) 05Open>03Resolved [16:03:09] (03CR) 10Gehel: [C: 032] "tested on wdqs1009, works fine" [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/473707 (https://phabricator.wikimedia.org/T206123) (owner: 10Mathew.onipe) [16:04:40] (03PS3) 10Bstorm: sonofgridengine: reworking bastion for stretch and docker [puppet] - 10https://gerrit.wikimedia.org/r/473647 (https://phabricator.wikimedia.org/T200557) [16:05:58] (03CR) 10Bstorm: [C: 032] sonofgridengine: reworking bastion for stretch and docker [puppet] - 10https://gerrit.wikimedia.org/r/473647 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [16:10:43] (03PS1) 10Mathew.onipe: changed version number in changelog [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/473767 [16:11:39] (03CR) 10Gehel: [C: 032] changed version number in changelog [debs/prometheus-blazegraph-exporter] - 10https://gerrit.wikimedia.org/r/473767 (owner: 10Mathew.onipe) [16:12:46] 10Operations: The icinga web interface can't read the icinga log file - https://phabricator.wikimedia.org/T209568 (10Dzahn) a:03Dzahn [16:14:51] (03PS1) 10Cwhite: icinga: ensure 0644 on /var/log/icinga/icinga.log to fix webui [puppet] - 10https://gerrit.wikimedia.org/r/473770 (https://phabricator.wikimedia.org/T209568) [16:15:31] (03PS2) 10Cwhite: icinga: ensure 0644 on /var/log/icinga/icinga.log to fix webui [puppet] - 10https://gerrit.wikimedia.org/r/473770 (https://phabricator.wikimedia.org/T209568) [16:18:55] 10Operations, 10Operations-Software-Development: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10Volans) As we'll be tackling this shortly, we should start deciding which report we want to write and what kind of pup... [16:19:48] (03CR) 10Dzahn: [C: 031] icinga: ensure 0644 on /var/log/icinga/icinga.log to fix webui [puppet] - 10https://gerrit.wikimedia.org/r/473770 (https://phabricator.wikimedia.org/T209568) (owner: 10Cwhite) [16:27:13] (03CR) 10Cwhite: [C: 032] icinga: ensure 0644 on /var/log/icinga/icinga.log to fix webui [puppet] - 10https://gerrit.wikimedia.org/r/473770 (https://phabricator.wikimedia.org/T209568) (owner: 10Cwhite) [16:27:53] (03PS1) 10Andrew Bogott: rename labvirt1015 to cloudvirt1015 [puppet] - 10https://gerrit.wikimedia.org/r/473773 (https://phabricator.wikimedia.org/T209531) [16:31:38] (03PS1) 10Andrew Bogott: rename labvirt1015 to cloudvirt1015 [dns] - 10https://gerrit.wikimedia.org/r/473774 [16:32:20] (03CR) 10Elukey: [C: 031] Share keytabs directory for multiple services [puppet] - 10https://gerrit.wikimedia.org/r/473760 (owner: 10Muehlenhoff) [16:32:24] (03CR) 10Andrew Bogott: [C: 032] rename labvirt1015 to cloudvirt1015 [puppet] - 10https://gerrit.wikimedia.org/r/473773 (https://phabricator.wikimedia.org/T209531) (owner: 10Andrew Bogott) [16:33:05] (03CR) 10Andrew Bogott: [C: 032] rename labvirt1015 to cloudvirt1015 [dns] - 10https://gerrit.wikimedia.org/r/473774 (owner: 10Andrew Bogott) [16:34:22] (03PS1) 10Andrew Bogott: cleanup after renaming labvirt1015 [dns] - 10https://gerrit.wikimedia.org/r/473775 [16:35:18] 10Operations, 10Patch-For-Review: The icinga web interface can't read the icinga log file - https://phabricator.wikimedia.org/T209568 (10colewhite) 05Open>03Resolved [16:37:22] !log rebuilding labvirt1015 and cloudvirt1015 [16:37:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:37] !log rebooted labsdb1007 for upgrades [16:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:42:20] 10Operations, 10OTRS: Upgrade to OTRS version 5.0.31 - https://phabricator.wikimedia.org/T209184 (10akosiaris) [16:52:46] (03PS1) 10Alexandros Kosiaris: package_builder: Switch to class declaration syntax [puppet] - 10https://gerrit.wikimedia.org/r/473782 [16:57:01] 10Operations, 10Operations-Software-Development: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10crusnov) I can't help but feel there may be a scenario where one wants to test a report on the test instance without i... [17:00:04] godog and _joe_: It is that lovely time of the day again! You are hereby commanded to deploy Puppet SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1700). [17:00:04] thcipriani: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:01:14] (03PS1) 10Andrew Bogott: cloudvirt1015: disable notifications during rebuild [puppet] - 10https://gerrit.wikimedia.org/r/473784 (https://phabricator.wikimedia.org/T209531) [17:01:48] (03CR) 10Andrew Bogott: [C: 032] cloudvirt1015: disable notifications during rebuild [puppet] - 10https://gerrit.wikimedia.org/r/473784 (https://phabricator.wikimedia.org/T209531) (owner: 10Andrew Bogott) [17:02:43] * thcipriani *waves* [17:04:24] 10Operations, 10cloud-services-team (Kanban): Reboot WMCS servers for L1TF - https://phabricator.wikimedia.org/T207377 (10aborrero) [17:05:19] <_joe_> thcipriani: oh here you are [17:05:19] (03PS1) 10Andrew Bogott: cloudvirt1015: re-enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/473785 (https://phabricator.wikimedia.org/T209531) [17:05:52] _joe_: I'm sneaky :) [17:05:55] (03CR) 10Giuseppe Lavagetto: [C: 032] Gerrit: add basic robots.txt for proxy [puppet] - 10https://gerrit.wikimedia.org/r/473638 (https://phabricator.wikimedia.org/T209456) (owner: 10Thcipriani) [17:06:08] (03PS3) 10Giuseppe Lavagetto: Gerrit: add basic robots.txt for proxy [puppet] - 10https://gerrit.wikimedia.org/r/473638 (https://phabricator.wikimedia.org/T209456) (owner: 10Thcipriani) [17:07:09] (03PS1) 10Filippo Giunchedi: swift: update statsd_exporter mappings for remaining metrics [puppet] - 10https://gerrit.wikimedia.org/r/473787 [17:07:27] (03PS1) 10Bstorm: sonofgridengine: remove storage mentions from docker config for bastion [puppet] - 10https://gerrit.wikimedia.org/r/473788 (https://phabricator.wikimedia.org/T200557) [17:07:45] (03PS2) 10Filippo Giunchedi: swift: update statsd_exporter mappings for remaining metrics [puppet] - 10https://gerrit.wikimedia.org/r/473787 [17:08:43] <_joe_> thcipriani: merged and puppet running on cobalt [17:08:57] <_joe_> Gerrit::Proxy/File[/var/www/robots.txt]: [17:09:00] <_joe_> done! [17:09:14] _joe_: awesome! thank you! [17:11:22] (03CR) 10Filippo Giunchedi: [C: 032] swift: update statsd_exporter mappings for remaining metrics [puppet] - 10https://gerrit.wikimedia.org/r/473787 (owner: 10Filippo Giunchedi) [17:14:17] (03PS2) 10Bstorm: sonofgridengine: remove storage mentions from docker config for bastion [puppet] - 10https://gerrit.wikimedia.org/r/473788 (https://phabricator.wikimedia.org/T200557) [17:15:31] (03PS1) 10Cwhite: icinga: manage permissions for replicated files [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) [17:16:48] (03CR) 10Alexandros Kosiaris: [C: 032] "wmf-style: total violations delta -2" [puppet] - 10https://gerrit.wikimedia.org/r/473782 (owner: 10Alexandros Kosiaris) [17:16:56] (03PS2) 10Alexandros Kosiaris: package_builder: Switch to class declaration syntax [puppet] - 10https://gerrit.wikimedia.org/r/473782 [17:19:30] (03CR) 10Bstorm: [C: 032] sonofgridengine: remove storage mentions from docker config for bastion [puppet] - 10https://gerrit.wikimedia.org/r/473788 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:20:56] !log upgrade prometheus-blazegraph-exporter on all wdqs nodes - T206123 [17:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:00] T206123: Monitor query / request concurrency on Blazegraph - https://phabricator.wikimedia.org/T206123 [17:25:17] (03CR) 10EBernhardson: elasticsearch: create multiple elasticsearch instances on cirrus codfw (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) (owner: 10Gehel) [17:27:18] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 1 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:18] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 2 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:18] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 3 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:19] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 5 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:19] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 6 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:19] !log anomie@mwmaint1002 Running refreshExternallinksIndex.php on section 7 wikis in group 2 for T209373. This may cause lag in codfw. [17:27:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:22] T209373: Run maintenance/refreshExternallinksIndex.php on all wikis - https://phabricator.wikimedia.org/T209373 [17:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:37] (03PS1) 10Bstorm: sonofgridengine: remove dependency on storage part [puppet] - 10https://gerrit.wikimedia.org/r/473793 (https://phabricator.wikimedia.org/T200557) [17:30:45] (03CR) 10Bstorm: [C: 032] sonofgridengine: remove dependency on storage part [puppet] - 10https://gerrit.wikimedia.org/r/473793 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:34:07] (03PS8) 10Volans: Add Icinga module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) [17:34:09] (03PS2) 10Volans: Add Puppet module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) [17:34:11] (03PS1) 10Volans: Add administrative module [software/spicerack] - 10https://gerrit.wikimedia.org/r/473796 (https://phabricator.wikimedia.org/T205884) [17:34:25] (03CR) 10Volans: "replies inline" (034 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/473735 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:35:02] (03CR) 10Volans: "Refactored based on the new Reason class" [software/spicerack] - 10https://gerrit.wikimedia.org/r/473506 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:35:23] (03CR) 10Dzahn: [C: 031] icinga: manage permissions for replicated files [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [17:36:45] (03PS2) 10Cwhite: icinga: manage permissions for replicated files [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) [17:38:39] Not entirely sure this is the best place for this question, but giving it a shot... If I want to make a change to a config variable that's set in mobile.php, but I only want that change to apply to certain wikis, what's the best way to do that? wrap in an `if (wgDBName ==....)` clause? [17:39:13] I haven't been able to find an example of this kind of conditional in a sub-import that I'm confident is current best practice. [17:41:54] (03CR) 10Volans: [C: 04-1] "I don't think we should manage the permissions of those files in Puppet as they would be created by Icinga with the right permissions. And" [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [17:43:51] marlier: Checking $wgDBName would work but won't scale very well. The scalable way to do it would be to create an appropriate $wmgFooBar variable in InitialiseSettings.php so you can use it in mobile.php. [17:43:59] (03PS1) 10Herron: logstash: set rsyslog-shipper input type to syslog [puppet] - 10https://gerrit.wikimedia.org/r/473800 (https://phabricator.wikimedia.org/T206454) [17:47:52] anomie: thanks! The use case here is testing the effect of a specific setting on Google index efficiency. It'll get enabled for about 6 wikis, and then either get turned on for all or get turned off for all. Given that (eg, scaling isn't that big a deal), which approach do you think makes more sense? [17:48:18] (03CR) 10Herron: "pcc looks good https://puppet-compiler.wmflabs.org/compiler1002/13526/" [puppet] - 10https://gerrit.wikimedia.org/r/473800 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [17:48:37] (03CR) 10Herron: [C: 032] logstash: set rsyslog-shipper input type to syslog [puppet] - 10https://gerrit.wikimedia.org/r/473800 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [17:51:57] marlier: The deal with "scaling" there is having to maintain the check in mobile.php. If it's reasonably short-term and you're not going to be adding or removing wikis from that "about 6" much then testing $wgDBname would probably be easier. If you wind up wanting to have it on on some wikis and off on others long-term, or changing it based on language or site, then you'd want to switch to a wmg. [17:52:00] (03PS5) 10Gehel: elasticsearch: create multiple elasticsearch instances on cirrus codfw [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) [17:52:47] anomie: 👍 [17:53:46] (03CR) 10Cwhite: "> I don't think we should manage the permissions of those files in" [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / Parsoid / Citoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1800). [18:00:12] no parsoid deploy [18:00:17] I'm going to deploy something for ores [18:00:25] (03CR) 10Dzahn: "thanks! i think it should be deployed after einsteinium stops using the alerting_host role. so that would be after https://gerrit.wikimedi" [puppet] - 10https://gerrit.wikimedia.org/r/473276 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn) [18:01:35] Rollback rev: bb39f4b3d6e9fd7410a4bd4cb4e1f44b176c5606 (just keeping it here in case) [18:02:04] (03CR) 10Dzahn: [C: 031] "i agree this should be merged and can be removed again after we remove jessie support. for now einsteinium is not really decom'ed yet and " [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [18:03:04] !log ladsgroup@deploy1001 Started deploy [ores/deploy@51cdf6b]: T208623 [18:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:08] T208623: ORES should log 500 responses - https://phabricator.wikimedia.org/T208623 [18:03:22] 10Operations, 10ops-codfw, 10Services: rack/setup/install restbase201[3-8].codfw.wmnet - https://phabricator.wikimedia.org/T209615 (10RobH) p:05Triage>03High [18:05:39] 10Operations, 10Performance-Team: Evaluate scalability and performance of PHP7 compared to HHVM - https://phabricator.wikimedia.org/T206341 (10Joe) Here are the first results that I feel comfortable sharing! I tested the english wikipedia pages for Australia (a mid-sized, not overly complex page) and Barack O... [18:07:23] (03PS6) 10Gehel: elasticsearch: create multiple elasticsearch instances on cirrus codfw [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) [18:07:56] (03CR) 10Gehel: elasticsearch: create multiple elasticsearch instances on cirrus codfw (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) (owner: 10Gehel) [18:08:59] <_joe_> SMalyshev: you might enjoy https://phabricator.wikimedia.org/T206341#4750994 :) [18:11:41] (03CR) 10Gehel: "pcc looks reasonable: https://puppet-compiler.wmflabs.org/compiler1002/13527/" [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) (owner: 10Gehel) [18:12:23] (03CR) 10Volans: [C: 04-1] "But this is not solving the issue until the next Icinga restart happens. The resources are not restarting Icinga." [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [18:12:41] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: rack/setup/install cloudvirt10[25-30].eqiad.wmnet - https://phabricator.wikimedia.org/T209616 (10RobH) p:05Triage>03High [18:13:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: rack/setup/install cloudvirt10[25-30].eqiad.wmnet - https://phabricator.wikimedia.org/T209616 (10RobH) [18:15:39] 10Operations, 10Wikimedia-Mailing-lists: Requesting creation of librarycard-dev mailing list - https://phabricator.wikimedia.org/T209081 (10Samwalton9) The list may contain sensitive data from the tool so we'd rather the archives be private. [18:17:46] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@51cdf6b]: T208623 (duration: 14m 41s) [18:17:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:49] T208623: ORES should log 500 responses - https://phabricator.wikimedia.org/T208623 [18:18:07] !log ladsgroup@deploy1001 Started deploy [ores/deploy@dba11e9]: Another small update [18:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:52] 10Operations, 10procurement: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10RobH) p:05Triage>03High [18:21:49] RECOVERY - MariaDB Slave Lag: pc3 on pc2009 is OK: OK slave_sql_lag Replication lag: 0.51 seconds [18:22:36] 10Operations, 10procurement: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10RobH) Well, netbox makes reviewing current rack placement easier, since it summarizes it: https://netbox.wikimedia.org/dcim/devices/?q=ms-be&site=eqiad&mac_address=&has_primary_ip=&cf_owne... [18:24:59] 10Operations, 10ops-eqiad, 10media-storage: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10RobH) [18:25:52] 10Operations, 10Scoring-platform-team, 10Wikimedia-Incident: ORES overload incident, 2017-11-28 - https://phabricator.wikimedia.org/T181538 (10Ladsgroup) [18:25:55] 10Operations, 10Scoring-platform-team (Current), 10User-Ladsgroup, 10Wikimedia-Incident: Celery manager implodes horribly if Redis goes down - https://phabricator.wikimedia.org/T181632 (10Ladsgroup) 05Open>03Resolved [18:26:31] 10Operations, 10Scoring-platform-team (Current), 10User-Ladsgroup, 10Wikimedia-Incident: Investigate redis-cluster or other techniques for making Redis not a single point of failure. - https://phabricator.wikimedia.org/T181559 (10Ladsgroup) a:03Ladsgroup We decided to go with redis-sentinel [18:26:47] 10Operations, 10Scoring-platform-team, 10Wikimedia-Incident: ORES overload incident, 2017-11-28 - https://phabricator.wikimedia.org/T181538 (10Ladsgroup) [18:26:49] 10Operations, 10Scoring-platform-team (Current), 10User-Ladsgroup, 10Wikimedia-Incident: Investigate redis-cluster or other techniques for making Redis not a single point of failure. - https://phabricator.wikimedia.org/T181559 (10Ladsgroup) 05Open>03Resolved [18:28:41] (03PS1) 10Papaul: Partman: Add new ms-be systems [puppet] - 10https://gerrit.wikimedia.org/r/473810 (https://phabricator.wikimedia.org/T209395) [18:29:00] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) p:05Triage>03High [18:29:42] (03CR) 10Cwhite: "> But this is not solving the issue until the next Icinga restart" [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [18:30:15] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install dbstore100[3-5].eqiad.wmnet - https://phabricator.wikimedia.org/T209620 (10RobH) @elukey: Can you confirm the racking and hostname details? Hostname Proposal: dbstore100[3-5].eqiad.wmnet, since this is a dbstore1002 replacement or... [18:30:28] (03CR) 10Andrew Bogott: [C: 032] cleanup after renaming labvirt1015 [dns] - 10https://gerrit.wikimedia.org/r/473775 (owner: 10Andrew Bogott) [18:30:44] (03PS2) 10Andrew Bogott: cloudvirt1015: re-enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/473785 (https://phabricator.wikimedia.org/T209531) [18:31:27] (03CR) 10Andrew Bogott: [C: 032] cloudvirt1015: re-enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/473785 (https://phabricator.wikimedia.org/T209531) (owner: 10Andrew Bogott) [18:31:49] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@dba11e9]: Another small update (duration: 13m 42s) [18:31:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:34:42] (03CR) 10Jforrester: "This broke WBMI on Beta Commons, BTW, as you don't specify a default for sites that aren't Wikidata.org any more. :-P" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473716 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [18:37:06] 10Operations, 10Traffic, 10Wikimedia-Incident: Add maint-announce@ to Equinix's recipient list for eqsin incidents - https://phabricator.wikimedia.org/T207140 (10RobH) > Vivian, > > Today I recieved a notice about "COMPLETED - Scheduled Generator Capacity Upgrade at the SG3 IBX [5-168459376275]" to my rhals... [18:37:12] addshore: ^^ [18:37:31] :O [18:37:51] oh yes, it might break it come to think of it [18:37:54] :P [18:38:00] as the default isn't an empty array [18:38:05] ;0 [18:38:13] :/ [18:38:42] 'default' => null, and then in Wikibase.php check for existence, and only set if the wmg global is set [18:39:08] yep, swat it in tomorrow? [18:39:20] mhhm, can do it now [18:39:47] I can't sadly :/ I have to be gone in 5mins. I'll be back later though [18:39:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Update label and switch to rename labvirt1015 to cloudvirt1015 - https://phabricator.wikimedia.org/T209622 (10Andrew) [18:40:02] i can do it :) [18:40:06] jouncebot: Now [18:40:06] For the next 0 hour(s) and 19 minute(s): Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1800) [18:40:13] jouncebot: Next [18:40:14] In 0 hour(s) and 19 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1900) [18:42:20] (03PS1) 10Jforrester: Follow-up c611826: Define the `multilang` string length for Commons again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473813 [18:43:28] (03PS1) 10Addshore: Wikibase.php, only set string-limits if $wmg var is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473814 [18:43:33] (03PS1) 10Tarrow: Don't set Wikibase StringLengths on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473815 (https://phabricator.wikimedia.org/T154660) [18:43:38] Well. [18:43:44] I don't think we need all three. :-) [18:43:44] (03PS1) 10Addshore: wmgWikibaseStringLimits default null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473816 [18:43:48] James_F: ^^ [18:44:03] haha, beaten [18:44:04] haha, i made mine swat ready / in the order they need to be deployed ;) [18:44:18] Sure, I'll take addshore's. [18:44:21] :D [18:44:23] Sorry tarrow. ;-) [18:44:24] can't they be deployed together? [18:44:33] I'm deploying now unless anyone shouts. [18:44:40] no problem! I have to go now anyway :) [18:44:40] tarrow: no, if you sync IS.php first, then Wikibase.php will start warnings [18:44:54] null => dont set the global at all for IS.php :/ [18:44:57] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Wikidata Query Service is overly verbose toward logstash - https://phabricator.wikimedia.org/T150356 (10Smalyshev) 05Open>03Resolved a:03Smalyshev Should be fixed with new logging system. [18:45:02] (03CR) 10jerkins-bot: [V: 04-1] Don't set Wikibase StringLengths on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473815 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [18:45:08] Personally thats something we should totally fix, but meh [18:45:24] (03Abandoned) 10Tarrow: Don't set Wikibase StringLengths on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473815 (https://phabricator.wikimedia.org/T154660) (owner: 10Tarrow) [18:45:26] (03CR) 10Jforrester: [C: 032] Wikibase.php, only set string-limits if $wmg var is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473814 (owner: 10Addshore) [18:45:42] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: Sanitizing input and increase throttling rate for wdqs errors to prevent spamming logstash - https://phabricator.wikimedia.org/T207643 (10Smalyshev) 05Open>03Resolved Should be ok now. [18:45:46] (03Abandoned) 10Jforrester: Follow-up c611826: Define the `multilang` string length for Commons again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473813 (owner: 10Jforrester) [18:45:48] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: rack/setup/install cloudvirt10[25-30].eqiad.wmnet - https://phabricator.wikimedia.org/T209616 (10Andrew) [18:46:49] (03Merged) 10jenkins-bot: Wikibase.php, only set string-limits if $wmg var is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473814 (owner: 10Addshore) [18:48:13] (03CR) 10EBernhardson: [C: 031] elasticsearch: create multiple elasticsearch instances on cirrus codfw [puppet] - 10https://gerrit.wikimedia.org/r/473258 (https://phabricator.wikimedia.org/T207918) (owner: 10Gehel) [18:50:12] !log jforrester@deploy1001 Synchronized wmf-config/Wikibase.php: Hot-deploy Ib10de2e3: Don't set Wikibase string limits when null (duration: 00m 55s) [18:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:37] (03CR) 10Jforrester: [C: 032] wmgWikibaseStringLimits default null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473816 (owner: 10Addshore) [18:51:16] addshore, tarrow: Thank you both. :-) [18:51:22] James_F: np :) [18:51:23] (03CR) 1020after4: [C: 031] "+1 fwiw" [puppet] - 10https://gerrit.wikimedia.org/r/472951 (https://phabricator.wikimedia.org/T209176) (owner: 10ArielGlenn) [18:51:28] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Remove /srv/deployment/wdqs/wdqs/rules.log symlink - https://phabricator.wikimedia.org/T144539 (10Smalyshev) This file is mentioned as appender for `com.bigdata.relation.rule.eval.RuleLog` but I don't think we even use these logging configs a... [18:51:47] James_F: I'm starting to think the best option for all of the wmg options vars in Wikibase.php is to wrap them all in isset() conditions ... [18:51:49] (03Merged) 10jenkins-bot: wmgWikibaseStringLimits default null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473816 (owner: 10Addshore) [18:51:59] (03PS1) 10Papaul: DHCP: ADD MAC address entries for ms-be204[4-9] and ms-be2050 [puppet] - 10https://gerrit.wikimedia.org/r/473817 (https://phabricator.wikimedia.org/T209395) [18:52:07] James_F: but thats for a future day [18:52:08] addshore: Or maybe bake that into mutliversion.php? [18:52:27] James_F: yes, or make IS and multiversion able to set array keys in the first place [18:52:33] Or that. :-) [18:52:49] Or… maybe we should just bite the bullet and move to ProductionConfig.json [18:53:07] How would that work? / does it exist? [18:53:22] (03PS4) 10Kosta Harlan: Switch on data collection for Understanding First Day project [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471792 (https://phabricator.wikimedia.org/T208773) [18:53:25] 10Operations, 10Operations-Software-Development: Develop and deploy at least three Netbox reports to assist with data correctness and consistency - https://phabricator.wikimedia.org/T205899 (10ayounsi) More report suggestions I have in mind: - Display all active network devices not connected to a console serve... [18:53:28] Static configuration rather than twenty magic code files arguing with each other. [18:53:33] The Dream™. [18:53:38] hehe [18:54:19] (03PS2) 10BBlack: Lower WebP thumbnail hotness threshold further [puppet] - 10https://gerrit.wikimedia.org/r/472436 (https://phabricator.wikimedia.org/T27611) (owner: 10Gilles) [18:55:05] (03CR) 10BBlack: [C: 032] Lower WebP thumbnail hotness threshold further [puppet] - 10https://gerrit.wikimedia.org/r/472436 (https://phabricator.wikimedia.org/T27611) (owner: 10Gilles) [18:55:22] It'd be nice to have Special:SiteConfiguration as a (read-only) way for communities to see what things we've got set where. [18:55:37] Florian and others have worked on the dream over time. [18:55:46] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Hot-deploy I26f2dc2e: Don't over-ride default Wikibase string limits (duration: 00m 53s) [18:55:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:43] I surrender the conch. [18:56:57] (03CR) 10jenkins-bot: Wikibase.php, only set string-limits if $wmg var is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473814 (owner: 10Addshore) [18:56:59] (03CR) 10jenkins-bot: wmgWikibaseStringLimits default null [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473816 (owner: 10Addshore) [18:59:48] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL - No data received from host [19:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Morning SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T1900). [19:00:04] Zoranzoki21 and kostajh: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:23] \o [19:00:50] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 283 bytes in 0.473 second response time [19:00:50] taking a look at thumbor [19:00:52] what's up with thumbor? [19:00:54] thumbor page is from the change I just merged [19:00:56] ah nevermind [19:00:57] thanks [19:00:58] already being looked at [19:01:03] I was about to ask (b black) [19:01:06] ack, thx [19:01:11] I'll do the SWAT today [19:01:23] Just gotta get back to my desk [19:01:29] ack [19:01:44] WebP usage went up, there was a short thumbor traffic spike, it should subside. [19:02:04] k [19:09:59] Zoranzoki21: Are you here for your SWAT? [19:11:39] 10Operations, 10User-jijiki: Add option maxmemory-policy: 'volatile-lru' on Redis class for debian stretch - https://phabricator.wikimedia.org/T209628 (10jijiki) [19:12:07] 10Operations, 10User-jijiki: Add option maxmemory-policy: 'volatile-lru' on Redis class for debian stretch - https://phabricator.wikimedia.org/T209628 (10jijiki) p:05Triage>03Normal a:03jijiki [19:12:41] 10Operations, 10User-jijiki: Add option maxmemory-policy: 'volatile-lru' on Redis class for debian stretch - https://phabricator.wikimedia.org/T209628 (10jijiki) [19:14:10] (03PS1) 10Dzahn: remove tegmen.mgmt which was duplicated for renaming [dns] - 10https://gerrit.wikimedia.org/r/473820 (https://phabricator.wikimedia.org/T208824) [19:15:24] (03CR) 10Dzahn: [C: 032] remove tegmen.mgmt which was duplicated for renaming [dns] - 10https://gerrit.wikimedia.org/r/473820 (https://phabricator.wikimedia.org/T208824) (owner: 10Dzahn) [19:21:28] (03PS1) 10Bstorm: sonofgridengine: unbreak the bastion profile [puppet] - 10https://gerrit.wikimedia.org/r/473822 (https://phabricator.wikimedia.org/T209627) [19:22:34] (03CR) 10Bstorm: [C: 032] sonofgridengine: unbreak the bastion profile [puppet] - 10https://gerrit.wikimedia.org/r/473822 (https://phabricator.wikimedia.org/T209627) (owner: 10Bstorm) [19:24:08] (03CR) 10Catrope: [C: 032] Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:26:44] (03PS3) 10Catrope: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:26:54] (03CR) 10Catrope: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:26:58] (03CR) 10Catrope: [C: 032] Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:28:04] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, 10Release-Engineering-Team (Kanban): Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [19:28:12] (03Merged) 10jenkins-bot: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:28:44] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, 10Release-Engineering-Team (Kanban): Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) I have migrated the last jobs still using Nodepool. Ready to phase out Nodepo... [19:39:52] (03CR) 10jenkins-bot: Configure sensitive namespaces for EditorJourney schema [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473742 (https://phabricator.wikimedia.org/T207307) (owner: 10Kosta Harlan) [19:45:56] (03PS1) 10Andrew Bogott: nova: depool labvirt1010 and 1011 [puppet] - 10https://gerrit.wikimedia.org/r/473828 (https://phabricator.wikimedia.org/T209626) [19:46:01] (03PS2) 10Papaul: DNS: Add production and mgmt DNS entries for ms-be200[4-9] and ms-be2050 [dns] - 10https://gerrit.wikimedia.org/r/473646 (https://phabricator.wikimedia.org/T209395) [19:47:31] (03CR) 10Addshore: "This was pt1 of a fix for https://phabricator.wikimedia.org/T154660#4751620" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473814 (owner: 10Addshore) [19:47:37] (03CR) 10Addshore: "This was pt2 of a fix for https://phabricator.wikimedia.org/T154660#4751620" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473816 (owner: 10Addshore) [19:49:09] !log catrope@deploy1001 Synchronized php-1.33.0-wmf.4/extensions/WikimediaEvents/: cherry-picks for T208773 (duration: 00m 54s) [19:49:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:16] T208773: Enable $wgWMEUnderstandingFirstDay on Testwiki, Czech and Korean wikis - https://phabricator.wikimedia.org/T208773 [19:51:11] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Configure sensitive namespaces for EditorJourney schema (T207307) (duration: 00m 53s) [19:51:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:16] T207307: Understanding first day: testing and QA - https://phabricator.wikimedia.org/T207307 [19:53:35] (03PS5) 10Catrope: Switch on data collection for Understanding First Day project [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471792 (https://phabricator.wikimedia.org/T208773) (owner: 10Kosta Harlan) [19:53:40] (03CR) 10Catrope: [C: 032] Switch on data collection for Understanding First Day project [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471792 (https://phabricator.wikimedia.org/T208773) (owner: 10Kosta Harlan) [19:54:05] (03CR) 10Andrew Bogott: [C: 032] nova: depool labvirt1010 and 1011 [puppet] - 10https://gerrit.wikimedia.org/r/473828 (https://phabricator.wikimedia.org/T209626) (owner: 10Andrew Bogott) [19:54:35] (03PS1) 10Andrew Bogott: horizon: enable eqiad1-r for the 'video' project [puppet] - 10https://gerrit.wikimedia.org/r/473830 (https://phabricator.wikimedia.org/T209632) [19:55:12] (03Merged) 10jenkins-bot: Switch on data collection for Understanding First Day project [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471792 (https://phabricator.wikimedia.org/T208773) (owner: 10Kosta Harlan) [19:55:19] (03CR) 10Andrew Bogott: [C: 032] horizon: enable eqiad1-r for the 'video' project [puppet] - 10https://gerrit.wikimedia.org/r/473830 (https://phabricator.wikimedia.org/T209632) (owner: 10Andrew Bogott) [19:55:26] (03CR) 10jenkins-bot: Switch on data collection for Understanding First Day project [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471792 (https://phabricator.wikimedia.org/T208773) (owner: 10Kosta Harlan) [19:57:55] (03PS3) 10Niedzielski: Doc: add repoConceptBaseUri comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473292 (https://phabricator.wikimedia.org/T209352) [19:58:23] (03PS4) 10Niedzielski: Doc: add repoConceptBaseUri comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473292 (https://phabricator.wikimedia.org/T209352) [19:58:52] (03CR) 10Niedzielski: "@lucas, revised to add documentation via your comment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473292 (https://phabricator.wikimedia.org/T209352) (owner: 10Niedzielski) [20:00:04] Deploy window MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181115T2000) [20:01:55] (03PS1) 10Andrew Bogott: nova: add cloudvirt1015 to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/473831 (https://phabricator.wikimedia.org/T209531) [20:03:20] (03CR) 10Andrew Bogott: [C: 032] nova: add cloudvirt1015 to the scheduler pool [puppet] - 10https://gerrit.wikimedia.org/r/473831 (https://phabricator.wikimedia.org/T209531) (owner: 10Andrew Bogott) [20:04:12] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable data collection for UnderstandingFirstDay on cswiki and kowiki (duration: 00m 53s) [20:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:34] !log dropping disused keyspaces -- T208616 [20:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:04:38] T208616: Drop section-offsets keyspaces - https://phabricator.wikimedia.org/T208616 [20:07:18] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [20:11:19] !log mholloway-shell@deploy1001 Started deploy [kartotherian/deploy@48a1e83]: Fix: Loosen WDQS content-type header check to unbreak maps (T209471) [20:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:22] T209471: HTTP 400 error "Bad geojson - unknown type ExternalData" for Maps from Wikidata query; nothing displayed - https://phabricator.wikimedia.org/T209471 [20:11:37] (03PS1) 10Bstorm: sonofgridengine: add cronrunner role for stretch grid [puppet] - 10https://gerrit.wikimedia.org/r/473833 (https://phabricator.wikimedia.org/T200557) [20:12:18] (03PS1) 10Hashar: nodepool: labtestservices2003 is not used for testing [puppet] - 10https://gerrit.wikimedia.org/r/473834 (https://phabricator.wikimedia.org/T209361) [20:12:42] (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: add cronrunner role for stretch grid [puppet] - 10https://gerrit.wikimedia.org/r/473833 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [20:14:05] (03PS2) 10Bstorm: sonofgridengine: add cronrunner role for stretch grid [puppet] - 10https://gerrit.wikimedia.org/r/473833 (https://phabricator.wikimedia.org/T200557) [20:15:45] !log mholloway-shell@deploy1001 Finished deploy [kartotherian/deploy@48a1e83]: Fix: Loosen WDQS content-type header check to unbreak maps (T209471) (duration: 04m 26s) [20:15:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:19] (03PS1) 10Niedzielski: BC Wikibase: override repoConceptBaseUri [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473835 (https://phabricator.wikimedia.org/T209352) [20:17:55] !log re-added Chase to pwstore, signed .users file, re-encrypted all pwstore files, git pushed [20:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:11] !log mholloway-shell@deploy1001 Started deploy [kartotherian/deploy@UNKNOWN]: Fix: Loosen WDQS content-type header check to unbreak maps (T209471) [20:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:15] T209471: HTTP 400 error "Bad geojson - unknown type ExternalData" for Maps from Wikidata query; nothing displayed - https://phabricator.wikimedia.org/T209471 [20:22:06] !log mholloway-shell@deploy1001 Finished deploy [kartotherian/deploy@UNKNOWN]: Fix: Loosen WDQS content-type header check to unbreak maps (T209471) (duration: 03m 55s) [20:22:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:10] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Smalyshev) @Imarlier nothing special in GC that can be linked to the errors. GC times seem to be low and unexceptional. [20:31:46] can elastic be queries from grafana? [20:31:50] *queried [20:38:18] (03CR) 10Alex Monk: [C: 032] acme_requests: Fix finalize_order() exception handling [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473757 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:38:33] (03CR) 10Alex Monk: [C: 032] Release 0.7 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473758 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:38:36] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [20:38:56] (03CR) 10Alex Monk: [C: 032] debian: Add release 0.7 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473759 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:39:14] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [20:40:31] (03Merged) 10jenkins-bot: acme_requests: Fix finalize_order() exception handling [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473757 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:40:44] (03Merged) 10jenkins-bot: Release 0.7 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473758 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:41:06] (03Merged) 10jenkins-bot: debian: Add release 0.7 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473759 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:42:55] (03CR) 10jenkins-bot: acme_requests: Fix finalize_order() exception handling [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473757 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:42:57] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, 10Release-Engineering-Team (Kanban): Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10hashar) [20:43:05] (03CR) 10jenkins-bot: Release 0.7 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473758 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:43:35] (03CR) 10jenkins-bot: debian: Add release 0.7 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/473759 (https://phabricator.wikimedia.org/T208967) (owner: 10Vgutierrez) [20:50:12] (03PS1) 10Hashar: Make labnodepool1001.eqiad.wmnet a spare system [puppet] - 10https://gerrit.wikimedia.org/r/473838 (https://phabricator.wikimedia.org/T209642) [20:51:20] 10Operations, 10Performance-Team: Evaluate scalability and performance of PHP7 compared to HHVM - https://phabricator.wikimedia.org/T206341 (10Gilles) Have you diffed the output coming from HHVM and PHP7, to ensure that they're generating the same HTML for these pages? [20:56:14] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10hashar) [20:57:21] (03PS3) 10Bstorm: sonofgridengine: add cronrunner role for stretch grid [puppet] - 10https://gerrit.wikimedia.org/r/473833 (https://phabricator.wikimedia.org/T200557) [20:57:46] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [21:02:03] (03PS1) 10Hashar: ci: stop monitoring zmq on Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/473846 (https://phabricator.wikimedia.org/T209361) [21:03:28] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [21:05:38] !log Stopped nodepool on labnodepool1001.eqiad.wmnet . Service is no more used. T209361 T209642 [21:05:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:42] T209361: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 [21:05:43] T209642: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 [21:06:58] !log Deleting Nodepool instances on contintcloud T209361 [21:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:09:33] (03PS1) 10Herron: peopleweb: include rsyslog kafka_shipper [puppet] - 10https://gerrit.wikimedia.org/r/473847 (https://phabricator.wikimedia.org/T205852) [21:12:31] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [21:13:19] 10Operations, 10monitoring, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Dzahn) [21:13:21] 10Operations, 10monitoring, 10Patch-For-Review: rename tegmen to icinga2001 and reinstall it with stretch - https://phabricator.wikimedia.org/T208824 (10Dzahn) 05Open>03Resolved [21:15:08] (03CR) 10Bstorm: [C: 032] sonofgridengine: add cronrunner role for stretch grid [puppet] - 10https://gerrit.wikimedia.org/r/473833 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [21:18:32] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Phase out Nodepool from production - https://phabricator.wikimedia.org/T209361 (10hashar) [21:20:17] Can anyone help me find the right person or place to ask about files and directories on mwmaint1002? Everything in my home directory there has disappeared. ~/home-terbium/ and everything else is gone, and all I have left is .bash_history and .gitignore. [21:27:33] Trey314159: eh.. i can look at that. what is your user name [21:27:50] @mutante tjones [21:28:32] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10hashar) [21:29:06] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10hashar) [21:31:40] 10Operations, 10cloud-services-team, 10Continuous-Integration-Infrastructure (shipyard), 10Nodepool, and 2 others: Remove labnodepool1001.eqiad.wmnet - https://phabricator.wikimedia.org/T209642 (10hashar) a:05hashar>03None CI no more rely on the service that is hosted on labnodepool1001.eqiad.wmnet. I... [21:34:56] Trey314159: i can confirm the issue and i see other users still have their home-terbium and i don't see an obvious reason yet what deleted it. i checked backups and in the last backup they are already gone.. so it doesn't seem to be very new. do you know when you saw them last time? [21:36:27] No database record found for: /home/tjones/home-terbium [21:36:56] that is when i try to let it search through older backups as well [21:37:15] mutante: thanks for checking! Last time I did work there was in Sept. My home directory was last modified on Oct 18, when I was at a conference, so I don't think it was done by me. [21:38:07] I may have moved files out of home-terbium/ (I just don't remember), are there any other files in the latest backup. Particularly something with reindex/reindexing in the file/directory name? [21:38:41] Trey314159: i see a directory called "reindex" in a backup from mwmaint1001 (not 1002) [21:40:17] mutante: that's the one I want. I can't log into mwmaint1001 (is it still around?) [21:40:26] Trey314159: Oct 18/19 is when i replaced mwmaint1001 with 1002 [21:40:33] that matches the date [21:40:38] let me restore the one from 1001 [21:40:53] no, that is gone but i have Bacula console [21:41:14] hold on [21:41:42] thanks, mutante! Can I get my bash profile, too, by any chance? [21:42:26] yes, it's there [21:42:33] thanks! [21:43:47] BTW, looking around, I noticed a few other users with home directory time stamps near mine also had empty directories (with no home-terbium/). Maybe others failed to migrate as well, but haven't noticed yet? [21:45:54] i'll check, would be weird, it was a loop over all users [21:46:36] i need a moment to get the restore going, i'll let you know as soon as i have something [21:47:19] Thanks again! [21:48:07] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:51:55] Trey314159: cd /home/tjones/restored/tjones/ on mwmaint1002 [21:52:31] mutante: thanks—looks like everything is there. Much appreciated! [21:52:48] glad Bacula actually worked :) [21:52:57] and didnt even take that long [21:53:22] i'll look at the rest of /home and compare [21:58:55] !log mwmaint1002 - restoring entire /home of mwmaint1001 from Bacula (job queued and to tmp dir, not directly into /home) [21:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:00:18] (03PS1) 10Imarlier: wmf-config: Enable wgMFNoindexPages for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473889 (https://phabricator.wikimedia.org/T206497) [22:01:43] (03CR) 10jerkins-bot: [V: 04-1] wmf-config: Enable wgMFNoindexPages for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473889 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [22:02:17] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:04:15] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Ok, picking this back up! I emailed into our support case 91912127436 > Support, > > This was dropped and not picked back up, so I'm trying to determine the status now. >... [22:05:11] (03PS2) 10Imarlier: wmf-config: Enable wgMFNoindexPages for 6 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473889 (https://phabricator.wikimedia.org/T206497) [22:08:30] (03CR) 10Imarlier: "If you don't mind, would appreciate a review..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/473889 (https://phabricator.wikimedia.org/T206497) (owner: 10Imarlier) [22:09:25] PROBLEM - HHVM rendering on mw2185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:10:23] RECOVERY - HHVM rendering on mw2185 is OK: HTTP OK: HTTP/1.1 200 OK - 74641 bytes in 0.309 second response time [22:11:02] mutante: bad news—all of my files are restored, but they are all empty! [22:11:31] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [22:17:35] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:18:42] Trey314159: oh.. yea. so that should mean that the Bacula job is queued but not done yet. still hopeful they will show up and will look at job status [22:19:03] mutante: ok, thanks. I'll take a look later tonight. [22:19:52] ok, same here, i am also waiting for the restore of the entire /home and the part that the file structure gets created before the content is normal, afaict [22:20:18] 10Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860 (10bd808) [22:20:20] Trey314159: btw, it's about exactly 50% of users that have a home-terbium and the others dont.. [22:20:26] but not all of them existed on terbium [22:20:45] will get back to it in a bit [22:21:09] mutante: cool. hopefully it was just me, then. [22:34:59] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:41:43] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [22:53:14] (03PS3) 10Cwhite: icinga: manage permissions for replicated files [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) [22:56:53] (03CR) 10Cwhite: "Thinking further on this changeset, it does not make sense to manage status.dat as it is regenerated on restart." [puppet] - 10https://gerrit.wikimedia.org/r/473789 (https://phabricator.wikimedia.org/T208824) (owner: 10Cwhite) [23:05:53] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:08:35] PROBLEM - Check systemd state on ruthenium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:10:23] icinga1001 ran manually in 39s without issues ... [23:11:01] RECOVERY - puppet last run on icinga1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:11:55] RECOVERY - Check systemd state on ruthenium is OK: OK - running: The system is fully operational [23:21:42] So I'm not really sure what to suggest for further debugging https://phabricator.wikimedia.org/T209656#4752391 (user gets 503. apache puts in logstash "Failed to read FastCGI header" ) [23:21:43] tips appreciated :) [23:28:37] PROBLEM - puppet last run on icinga1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:54:55] mutante: my home dir is wiped on mwmaint1002 too... [23:55:21] would be glad to get those scripts back...