[00:06:23] 06Operations, 10Wikimedia-Mailing-lists: deactivate maint-announce - https://phabricator.wikimedia.org/T143760#2672597 (10Peachey88) [00:06:25] 06Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#2672596 (10Peachey88) [00:35:34] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 3 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2672632 (10AndyRussG) @awight it takes a while for webrequest logs to land into Hive.... Also, regard... [00:38:23] PROBLEM - cassandra service on maps-test2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed [00:38:43] PROBLEM - cassandra CQL 10.192.0.128:9042 on maps-test2001 is CRITICAL: Connection refused [00:41:10] 06Operations, 10Wikimedia-Mailing-lists: Reach out to Google about @yahoo.com emails not reaching gmail inboxes (when sent to mailing lists) - https://phabricator.wikimedia.org/T146841#2672637 (10Peachey88) [00:50:44] RECOVERY - cassandra service on maps-test2001 is OK: OK - cassandra is active [00:51:04] RECOVERY - cassandra CQL 10.192.0.128:9042 on maps-test2001 is OK: TCP OK - 0.036 second response time on port 9042 [00:52:41] (03PS1) 10MaxSem: wfLoadExtension( 'GeoData' ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313154 [00:52:43] (03PS1) 10MaxSem: Kill $wmgEnableGeoSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313155 [00:52:45] (03PS1) 10MaxSem: No reason to ever vary $wgGeoDataDebug by wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313156 [00:52:47] (03PS1) 10MaxSem: GeoData: get rid of wmg, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313157 [00:52:50] (03PS1) 10MaxSem: GeoData: get rid of wmg, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313158 [01:00:09] on noc.wikimedia.org pyball/lvs config link goes to nowhere, it isn't on config-master [01:06:20] (03PS1) 10MaxSem: Remove dead pybal link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313162 [01:06:34] arseny92, thanks ^ [01:06:45] We're using etcd now [01:09:54] (03CR) 1020after4: "What's wrong with https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep ?" [puppet] - 10https://gerrit.wikimedia.org/r/312971 (https://phabricator.wikimedia.org/T146618) (owner: 1020after4) [01:15:15] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 3 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2672707 (10awight) > @awight it takes a while for webrequest logs to land into Hive.... Yes, come to... [01:25:51] PROBLEM - MariaDB Slave Lag: m3 on db1043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1537.29 seconds [01:32:20] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 39008 MB (3% inode=99%) [01:36:44] (03PS1) 10MaxSem: throttle: remove expired exceptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313166 [01:43:33] RECOVERY - MariaDB Slave Lag: m3 on db1043 is OK: OK slave_sql_lag Replication lag: 0.67 seconds [01:47:33] 06Operations, 03Interactive-Sprint: maps-test* hosts running low on space - https://phabricator.wikimedia.org/T146848#2672762 (10MaxSem) [01:47:50] gehel, just in case ^ [01:52:28] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 38423 MB (3% inode=99%) [01:57:00] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 3 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2672822 (10awight) I'm seeing something even stranger when I sort by timestamp. There are all 87-byte... [02:01:12] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 3 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2672851 (10awight) I'm still keeping the priority down to High, now that I've convinced myself that th... [02:08:01] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3788936 keys - replication_delay is 0 [02:12:28] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 38558 MB (3% inode=99%) [02:25:15] PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:34:49] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 38125 MB (3% inode=99%) [02:39:22] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:39:31] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [02:42:00] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [02:52:12] RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [02:56:41] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 36989 MB (3% inode=99%) [03:05:16] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 37245 MB (3% inode=99%) [03:06:24] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:16:24] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 37071 MB (3% inode=99%) [03:22:23] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38943 MB (3% inode=99%) [03:33:43] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 38889 MB (3% inode=99%) [03:42:09] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 36908 MB (3% inode=99%) [03:48:54] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:57:13] PROBLEM - puppet last run on maps-test2001 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 7 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[create_user-replication@maps2003-v4],Exec[create_user-replication@maps-test2002-v4],Exec[create_user-replication@maps-test2003-v4],Exec[create_user-replication@maps2002-v4] [03:59:26] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38484 MB (3% inode=99%) [04:05:54] PROBLEM - Disk space on maps-test2003 is CRITICAL: DISK CRITICAL - free space: /srv 38382 MB (3% inode=99%) [04:09:28] PROBLEM - cassandra CQL 10.192.0.128:9042 on maps-test2001 is CRITICAL: Connection refused [04:09:30] PROBLEM - cassandra service on maps-test2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed [04:16:09] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:21:40] RECOVERY - cassandra CQL 10.192.0.128:9042 on maps-test2001 is OK: TCP OK - 0.037 second response time on port 9042 [04:21:40] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38075 MB (3% inode=99%) [04:21:41] RECOVERY - cassandra service on maps-test2001 is OK: OK - cassandra is active [04:22:03] RECOVERY - puppet last run on maps-test2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:26:00] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 725 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3790945 keys - replication_delay is 725 [04:33:31] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:38:21] PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:43:10] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37373 MB (3% inode=99%) [04:45:59] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3779781 keys - replication_delay is 0 [04:51:22] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 39046 MB (3% inode=99%) [04:56:41] PROBLEM - puppet last run on maps-test2001 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 6 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[create_user-replication@maps2003-v4],Exec[create_user-replication@maps-test2002-v4],Exec[create_user-replication@maps-test2003-v4],Exec[create_user-replication@maps2002-v4] [04:57:52] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 36893 MB (3% inode=99%) [05:00:32] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [05:03:05] RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:06:23] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38459 MB (3% inode=99%) [05:20:14] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37160 MB (3% inode=99%) [05:42:41] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 36722 MB (3% inode=99%) [05:44:45] (03CR) 10Luke081515: [C: 031] RESTBase configuration for olo.wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/312808 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [06:02:27] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38555 MB (3% inode=99%) [06:27:26] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 36773 MB (3% inode=99%) [06:33:35] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 37790 MB (3% inode=99%) [06:47:31] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37845 MB (3% inode=99%) [06:51:29] RECOVERY - puppet last run on maps-test2001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:54:58] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37823 MB (3% inode=99%) [06:55:57] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 37607 MB (3% inode=99%) [07:05:50] (03PS3) 10Elukey: First draft of the Pivot UI's puppetization [puppet] - 10https://gerrit.wikimedia.org/r/312495 (https://phabricator.wikimedia.org/T138262) [07:10:08] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38031 MB (3% inode=99%) [07:12:05] (03PS4) 10Elukey: First draft of the Pivot UI's puppetization [puppet] - 10https://gerrit.wikimedia.org/r/312495 (https://phabricator.wikimedia.org/T138262) [07:16:02] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 37931 MB (3% inode=99%) [07:26:33] PROBLEM - puppet last run on maps-test2001 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 6 minutes ago with 7 failures. Failed resources (up to 3 shown): Exec[create_user-replication@maps2003-v4],Exec[create_user-replication@maps-test2002-v4],Exec[create_user-replication@maps-test2003-v4],Exec[create_user-replication@maps2002-v4] [07:30:16] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 2008.522632 Seconds [07:32:46] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 0.0 Seconds [07:32:57] PROBLEM - puppet last run on stat1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:37:40] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38901 MB (3% inode=99%) [07:39:58] was reading info about the beta deployment labs cluster , then noticed how the Main Page lists contributions in footer , went to see the contribs of MediaWiki default , and seeing that the contribs lists a CIDR global IP block supposedly matching the non-registered "user". Not to mention also that I do not have a labs account per se yet and viewing the page as guest, and the [07:39:58] block is supposedly around since months (although i get it that labs are for testing and anything happening there on-wikis doesn't directly affect prod as the sites are testing sanbox), that smells like a good amount of checkuser-related bug or bug with Special:Contributions or anything related to that page such as the block logs, however if also considering that deployment cluster [07:39:58] is for testing code changes before deployment to production, that contradicts all of the above . Very funny but also dangerous if that somehow made it's way to production during any prior deployments [07:40:04] https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:Contributions/MediaWiki_default [07:40:54] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 37711 MB (3% inode=99%) [07:48:39] PROBLEM - cassandra service on maps-test2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed [07:50:22] PROBLEM - cassandra CQL 10.192.0.128:9042 on maps-test2001 is CRITICAL: Connection refused [07:50:23] PROBLEM - puppet last run on bast3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:51:02] RECOVERY - cassandra service on maps-test2001 is OK: OK - cassandra is active [07:52:52] RECOVERY - cassandra CQL 10.192.0.128:9042 on maps-test2001 is OK: TCP OK - 0.037 second response time on port 9042 [07:58:13] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 36976 MB (3% inode=99%) [07:59:52] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37797 MB (3% inode=99%) [08:00:13] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:13:15] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 36880 MB (3% inode=99%) [08:15:03] !log twentyafterfour@iridium:/srv/phab/phabricator$ sudo bin/search index --type PhabricatorProject --force [08:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:17:48] RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:25:00] PROBLEM - Postgres Replication Lag on maps-test2003 is CRITICAL: CRITICAL - Rep Delay is: 5294.962329 Seconds [08:25:01] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 5295.371405 Seconds [08:25:01] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 5295.810351 Seconds [08:26:54] 06Operations, 03Interactive-Sprint: maps-test* hosts running low on space - https://phabricator.wikimedia.org/T146848#2672762 (10ArielGlenn) Note that we can't increase the number of inodes on these boxes on the fly (ext4), there's no room left for expansion of the lvms, and I've no idea what could be tossed f... [08:27:31] RECOVERY - Postgres Replication Lag on maps-test2003 is OK: OK - Rep Delay is: 0.0 Seconds [08:27:31] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 0.0 Seconds [08:27:31] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [08:32:25] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38887 MB (3% inode=99%) [08:34:42] (03CR) 10Hashar: "> What's wrong with https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep ?" [puppet] - 10https://gerrit.wikimedia.org/r/312971 (https://phabricator.wikimedia.org/T146618) (owner: 1020after4) [08:35:45] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 36554 MB (3% inode=99%) [08:37:46] PROBLEM - Postgres Replication Lag on maps-test2003 is CRITICAL: CRITICAL - Rep Delay is: 6060.498722 Seconds [08:40:17] RECOVERY - Postgres Replication Lag on maps-test2003 is OK: OK - Rep Delay is: 0.0 Seconds [08:50:42] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38096 MB (3% inode=99%) [08:52:42] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 36843 MB (3% inode=99%) [08:55:13] 06Operations, 10Datasets-General-or-Unknown: Reboot snapshot servers - https://phabricator.wikimedia.org/T146127#2673480 (10ArielGlenn) 05Open>03Resolved snapshot1006,7 done. Closing. [08:55:42] PROBLEM - Disk space on maps-test2002 is CRITICAL: DISK CRITICAL - free space: /srv 38832 MB (3% inode=99%) [09:17:32] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 36452 MB (3% inode=99%) [09:42:19] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 39079 MB (3% inode=99%) [09:59:29] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [10:01:59] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:09:52] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37857 MB (3% inode=99%) [10:14:11] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:39:12] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:42:03] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 39013 MB (3% inode=99%) [10:47:31] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:09:14] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 37735 MB (3% inode=99%) [11:11:57] 06Operations, 06Performance-Team, 10Thumbor: Archive file thumbs not working - https://phabricator.wikimedia.org/T145769#2673770 (10Gilles) [11:12:15] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [11:19:34] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 15765.343901 Seconds [11:19:35] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 15766.565523 Seconds [11:22:02] PROBLEM - Postgres Replication Lag on maps-test2003 is CRITICAL: CRITICAL - Rep Delay is: 15913.293485 Seconds [11:22:02] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 0.0 Seconds [11:22:03] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [11:26:54] RECOVERY - Postgres Replication Lag on maps-test2003 is OK: OK - Rep Delay is: 0.0 Seconds [11:28:53] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 39103 MB (3% inode=99%) [11:35:22] PROBLEM - cassandra service on maps-test2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed [11:36:33] PROBLEM - cassandra CQL 10.192.0.128:9042 on maps-test2001 is CRITICAL: Connection refused [11:41:22] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [11:43:53] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:43:53] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 39032 MB (3% inode=99%) [11:47:14] (03PS1) 10Gilles: Add key for gilles' new laptop [puppet] - 10https://gerrit.wikimedia.org/r/313199 [12:11:06] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38635 MB (3% inode=99%) [12:13:26] 06Operations, 06Performance-Team, 10Thumbor: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2673819 (10Gilles) This should be easier to figure out if it reoccurs with manhole in place [12:15:51] 06Operations, 06Performance-Team, 10Thumbor: Archive file thumbs not working - https://phabricator.wikimedia.org/T145769#2673820 (10Gilles) [12:40:40] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38796 MB (3% inode=99%) [12:41:00] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2673985 (10Gilles) Sigh, firejail prevents manhole from working properly. Without firejail: ``` 2016-09-28 12:37:51 thumbor:DEBUG Installing manhole M... [12:46:26] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2674011 (10Gilles) I'm not sure if it's the unix socket being blocked or some other thing like accessing the pid [13:10:36] PROBLEM - Disk space on maps-test2004 is CRITICAL: DISK CRITICAL - free space: /srv 38259 MB (3% inode=99%) [13:13:45] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:21:11] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2674065 (10Gilles) It seems to actually work after all, albeit with namespaced pid 2. I need to compensate for that though, as all thumbor processes wil... [13:38:30] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:45:10] (03PS1) 10Urbanecm: [throttling] IP cap lift for eswiki on 2016-09-30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313207 (https://phabricator.wikimedia.org/T146788) [13:45:50] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2674114 (10Gilles) [13:47:14] Hi, could somebody deploy https://gerrit.wikimedia.org/r/#/c/313207/ as an emergency patch? The event is at 2016-09-30 so I can't schedule it for regular deploy... Thanks very much! [13:48:30] addshore: anomie ostriches aude twentyafterfour RoanKattouw Dereckson ^ [13:50:00] greg-g: ^ [13:50:22] I can't! [13:50:45] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2674116 (10Gilles) Works now, with each thumbor process creating its socket by thumbor port: ``` vagrant@mediawiki-vagrant:~$ ls -al /tmp/manhole* srw... [13:56:09] addshore: Why? You're listed as deployer in SWAT windows (I know this is not SWAT; but every member of SWAT should be able to deploy everything everytime technically) so I poked you... [13:56:49] Urbanecm: technically I can, but I'm not at a laptop currently! [13:57:12] addshore: Oh, okay. [14:01:36] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2674128 (10Gilles) [14:12:57] PROBLEM - cassandra-a service on restbase1009 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [14:13:20] PROBLEM - cassandra-a SSL 10.64.48.120:7001 on restbase1009 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection refused [14:13:21] PROBLEM - cassandra-a CQL 10.64.48.120:9042 on restbase1009 is CRITICAL: Connection refused [14:17:46] RECOVERY - cassandra-a service on restbase1009 is OK: OK - cassandra-a is active [14:18:21] RECOVERY - cassandra-a SSL 10.64.48.120:7001 on restbase1009 is OK: SSL OK - Certificate restbase1009-a valid until 2017-09-12 15:33:48 +0000 (expires in 349 days) [14:20:40] RECOVERY - cassandra-a CQL 10.64.48.120:9042 on restbase1009 is OK: TCP OK - 0.002 second response time on port 9042 [14:26:06] Hello [14:29:37] Hi Dereckson [14:32:27] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:36:07] Urbanecm: send a mail to greg@wikimedia.org they'll be able to allow a deployment / ask someone to deploy that [14:55:57] (03PS1) 10EBernhardson: [cirrus] Remove deprecated per-user poolcounter config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313211 [14:57:34] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:03:39] Dereckson: I already poked all SWAT deployers because they should be able to deploy this. Should I ask Greg for approval? [15:04:07] Urbanecm yes [15:04:18] Okay, going to email him... [15:05:35] Tthanks [15:05:36] thanks [15:25:47] (03PS2) 10Urbanecm: [throttling] IP cap lift for eswiki on 2016-09-30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313207 (https://phabricator.wikimedia.org/T146788) [15:26:02] (03CR) 10BryanDavis: [C: 031] [throttling] IP cap lift for eswiki on 2016-09-30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313207 (https://phabricator.wikimedia.org/T146788) (owner: 10Urbanecm) [15:26:15] greg-g: can I deploy that trivial config change? ^ [15:28:27] Urbanecm: I'm pretty sure we can get it deployed for you. This week is weird because we are in a soft deploy freeze because of low availability of root users to help if something goes horribly wrong. [15:28:59] I know but what can happen when only throttle.php will by synced? I can imagine no thing... [15:29:04] bd808: ^ [15:30:08] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [15:30:23] Urbanecm: actually things can and have gone horribly wrong with a small config change. The worst case scenario is a wide spread HHVM cache overflow that would require bulk hhvm process restarts. [15:30:55] bd808: Hmm... [15:31:20] I know it sounds ridiculous [15:32:22] It's happened to me [15:32:39] Random innocent config file changes deployed with sync-file/sync-dir [15:32:54] bd808: yeah [15:32:56] But it caused enough HHVM processes to crash that there was user-visible outages [15:32:56] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [15:33:05] * bd808 has taken the whole cluster offline at least 3 times [15:33:21] not a fun feeling [15:35:48] Yeah... So what should I do? Should I email somebody (I still didn't do it...)? Or ask greg-g directly at IRC? Or deny the task completely? Hoping no... [15:37:09] I don't think this is funny. Maybe it can happen. But hope it won't :) [15:37:49] Urbanecm you can ask greg-g dirrectly on irc if you want, you may get a quicker response. [15:38:19] You should ask greg-g on the -releng channel [15:38:29] Okay, going to join there... [15:38:36] ok thanks [15:38:42] I've pinged him already. We'll get it handled [15:38:49] oh [15:39:05] see the "yeah" response above [15:39:15] 15:32 < greg-g> bd808: yeah [15:39:17] :) [15:39:23] * greg-g is in a 1:1 [15:39:34] ping again if you need a more substantive [15:39:59] Oh, sorry didn't see he responded [15:40:48] I'm confused. Should I do something? [15:41:17] Urbanecm: nope. just hang tight. [15:41:32] Okay, waiting for a poke :) [15:46:49] we're discussing the lack of babysitters in a backchannel [15:47:13] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674407 (10kerberizer) [15:47:52] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674419 (10kerberizer) [15:57:00] jynus hi, were all the tables converted to innodb on phabricator? [15:57:15] no [15:57:19] Oh [15:57:23] I asked for the ones that were aria [15:57:34] oh [15:57:36] they are a danger to create similar issues [15:57:44] oh [15:58:04] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674407 (10Reedy) I see numerous blank lines before line 423 in https://en.wikipedia.org/robots.txt [15:58:08] I have to run, I'll be back here later tonight [15:59:05] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:59:14] Urbanecm: I'm going to hold off on your patch for a bit, but I will get it out for you before 2016-09-29T00:00Z [15:59:45] bd808: Okay. Thanks in advance. [16:01:57] jynus i filled https://phabricator.wikimedia.org/T146910 if that was ok :), to convert the rest of the tables to innodb [16:02:44] paladox, as a DBA I agree with you [16:02:53] Ok, thanks :) [16:02:54] and I complained about that in the past [16:02:56] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674478 (10kerberizer) >>! In T146908#2674452, @Reedy wrote: > I see numerous blank lines before line 423 in https://en.wikipedia.org/robot... [16:02:58] Oh [16:03:00] but things are not that easy [16:03:11] Oh, yep [16:03:16] for example, it is believed to potentially create a regression [16:03:27] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [16:03:51] please see my comment on T146673#2674457 [16:03:52] T146673: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673 [16:04:01] jynus: yeah, re that, let's sync up (you and mukunda mostly) on what to do there [16:04:08] Ok [16:04:15] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 605 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3757320 keys - replication_delay is 605 [16:04:17] jynus: assuming next week after offsite :) [16:04:19] and add your subtask there only as potentially yet [16:04:28] (why I haven't bothered you) [16:04:31] I am all for elastic if it works [16:04:35] * greg-g nods [16:04:47] just note that the conversion was made in a rush [16:04:54] a real one :-) [16:05:14] Note that upstream doint support elasticsearch 2.*, but support 1.7.* [16:05:14] so let them know to communicate with me on the ticket [16:05:24] Unless upstream has done quite a bit of work, the elasticsearch integration was pretty poor the last time we tried [16:05:27] ^that is for greg [16:05:35] I have no idea what the difference is betweent hose two version, or if 2.* improves better searching [16:05:37] s/we/chad/ [16:05:58] bd808 in december 2014 they improved support [16:06:07] but they are considering dropping total support for it [16:06:10] jynus: them = upstream? [16:06:23] and allow users to create an extension that supports elasticsearch [16:06:33] mmodel/mukunda [16:06:57] jynus: got it [16:07:15] for this week I am working, just mostly offline/async [16:07:26] let him know I am ok with questions [16:07:34] I will answer on the ticket [16:08:53] jynus: awesome, thanks! [16:08:58] twentyafterfour: ^ :) [16:09:10] bd808 I believe the current InnoDB thing could be improved [16:09:41] but whatever works best for end users [16:10:19] as in, literally this is the first time I have seen using MySQL fulltest being used seriously [16:10:41] It was all the rage in 1997! [16:10:42] it worked too well for how limited it is :-P [16:11:14] so let's jump into elastic as soon as we can [16:12:17] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674560 (10kerberizer) >>! In T146908#2674525, @Reedy wrote: > Please feel free to submit a change request on gerrit to the operations/medi... [16:14:17] 06Operations, 10Wikimedia-Apache-configuration, 10Wikimedia-General-or-Unknown: Global robots.txt contains invalid empty line - https://phabricator.wikimedia.org/T146908#2674566 (10kerberizer) p:05Triage>03Normal a:03kerberizer [16:16:48] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3718986 keys - replication_delay is 0 [16:19:13] 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: grain-ensure erroneous mismatch with (bool)True vs (str)true - https://phabricator.wikimedia.org/T146914#2674583 (10hashar) [16:21:35] 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: grain-ensure erroneous mismatch with (bool)True vs (str)true - https://phabricator.wikimedia.org/T146914#2674598 (10hashar) [16:22:38] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:26:59] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:48:01] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:22:35] !log restarting db1069.s3 (stagnant replication) [17:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:39:57] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [17:40:30] greg-g: got it [18:10:09] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:21:23] (03PS35) 1020after4: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [18:21:51] (03PS36) 1020after4: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [18:25:58] (03CR) 1020after4: "@krinkle: I believe I addressed your concerns and this should be ready to merge once the core scap changes are live." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) (owner: 1020after4) [18:26:05] mutante__, around? [18:27:38] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [18:28:00] alchimista nope, he is on ops offsite and it would be 8pm there [18:28:29] nop, i'll poke him other time, thanks paladox [18:28:41] Your welcome [18:29:26] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2674988 (10schana) [18:31:55] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2675031 (10Nuria) Request approved on analytics' end [18:32:58] (03CR) 10Krinkle: Scap swat command (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) (owner: 1020after4) [18:35:58] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2675038 (10DarTar) Approved from Research too. [18:42:52] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:45:26] (03PS37) 1020after4: Scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) [18:46:10] (03CR) 1020after4: "@krinkle: updated the commit message. That was just an oversight (I already removed that arg a while ago)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) (owner: 1020after4) [18:46:37] (03CR) 1020after4: Scap swat command (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 (https://phabricator.wikimedia.org/T142880) (owner: 1020after4) [18:51:41] 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: grain-ensure erroneous mismatch with (bool)True vs (str)true - https://phabricator.wikimedia.org/T146914#2675080 (10hashar) Looks like the main reason we have `grain-ensure.py` is to execute salt commands without a master (file_config = local). Nowadays... [18:57:03] twentyafterfour: --changeid no longer exists either, right? It detects number or url as argument, not as named option [18:58:16] Krinkle: right, it's a positional arg now so no --changeid flag [19:01:08] (03Draft1) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 2 [puppet] - 10https://gerrit.wikimedia.org/r/313235 [19:01:12] (03Draft2) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 2 [puppet] - 10https://gerrit.wikimedia.org/r/313235 [19:03:06] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [19:26:29] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 3 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2675165 (10awight) Just a silly follow-up to my meta vs meta.m remarks: the banner we were investigati... [19:32:50] (03Draft1) 10Paladox: phabricator: Enable innodb_buffer_pool_load_at_startup and innodb_buffer_pool_dump_at_shutdown [puppet] - 10https://gerrit.wikimedia.org/r/313240 [19:32:52] (03Draft2) 10Paladox: phabricator: Enable innodb_buffer_pool_load_at_startup and innodb_buffer_pool_dump_at_shutdown [puppet] - 10https://gerrit.wikimedia.org/r/313240 [19:36:21] (03CR) 10Paladox: "We currently set ft_min_word_len but that's myisam" [puppet] - 10https://gerrit.wikimedia.org/r/313235 (owner: 10Paladox) [19:37:59] (03PS3) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 2 [puppet] - 10https://gerrit.wikimedia.org/r/313235 [19:47:36] (03PS3) 10Hashar: beta: update deployment-tin IP [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) [19:58:12] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:00:04] (03PS4) 10Hashar: beta: update deployment-tin IP and make it master [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) [20:01:26] (03PS4) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 1 [puppet] - 10https://gerrit.wikimedia.org/r/313235 [20:01:54] (03PS5) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 1 [puppet] - 10https://gerrit.wikimedia.org/r/313235 (https://phabricator.wikimedia.org/T146673) [20:02:56] (03PS6) 10Paladox: phabricator: Reduce innodb_ft_min_token_size from 3 to 1 [puppet] - 10https://gerrit.wikimedia.org/r/313235 (https://phabricator.wikimedia.org/T146673) [20:03:21] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [20:14:38] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 689 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3725704 keys - replication_delay is 689 [20:16:28] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.58 seconds [20:29:46] (03PS1) 10Hashar: Bring back Zend PHP on deployment server [puppet] - 10https://gerrit.wikimedia.org/r/313305 (https://phabricator.wikimedia.org/T146286) [20:33:05] 06Operations, 06Release-Engineering-Team, 07Beta-Cluster-reproducible, 13Patch-For-Review: mwscript on jessie mediawiki fails - https://phabricator.wikimedia.org/T146286#2675450 (10hashar) For some reason the Zend packages are no more installed on Jessie, though we actually need them at least for mwscript... [20:38:57] (03CR) 10BryanDavis: [C: 032] [throttling] IP cap lift for eswiki on 2016-09-30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313207 (https://phabricator.wikimedia.org/T146788) (owner: 10Urbanecm) [20:39:14] greg-g: ^ heads up. this is happening now [20:39:25] (03Merged) 10jenkins-bot: [throttling] IP cap lift for eswiki on 2016-09-30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313207 (https://phabricator.wikimedia.org/T146788) (owner: 10Urbanecm) [20:39:31] 06Operations, 06Release-Engineering-Team, 07Beta-Cluster-reproducible, 13Patch-For-Review: mwscript on jessie mediawiki fails - https://phabricator.wikimedia.org/T146286#2675477 (10thcipriani) It looks like we removed php5 packages as part of the move to jessie: https://github.com/wikimedia/operations-pupp... [20:41:37] bd808: /me nods [20:43:10] <|L> greg-g: see https://phabricator.wikimedia.org/T146788 [20:43:15] <|L> no other way thatn doing it this way [20:43:20] <|L> *this week [20:43:45] pulled to mw1099 and nothing melted [20:45:10] bd808: +1 thanks for that :] [20:45:25] * apergos puts their phone in vibrate mode and hides it in the other room [20:46:33] ack. looks like logmsgbot is busted [20:46:56] !log scap sync-file wmf-config/throttle.php "IP cap lift for eswiki on 2016-09-30 (T146788)" [20:46:58] T146788: IP cap lift for es.wiki on 2016-09-30 - https://phabricator.wikimedia.org/T146788 [20:47:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:25] !log logmsgbot seems to be down: "error: [Errno 111] Connection refused" from scap sync-file [20:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:30] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2675507 (10hashar) I have dropped deployment-tin02 it was confusing people and create a deployment-tin which is now the master https://gerrit.wikimedia.... [20:49:13] * bd808 sees no active fires as a result of the config change [20:49:32] heh, this is a trivial change [20:49:59] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:50:14] it's funny that Urbanecm chose to use -0:00, though [20:50:28] that typically means "I don't really know my timezone" [20:50:30] restarted it [20:50:48] !log restarted logmsgbot on neon [20:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:50:53] Testing log messages from tin [20:51:57] !log apergos is awesome and made the bot work again by restarting it [20:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:52:12] now the restart is in the log twice [20:52:19] oh no! [20:52:19] justsayin :-P [20:53:28] apergos: you forgot to say you were awesome in the first message. I had to fix that [20:53:29] well this seems like a good time to check back out again [20:53:41] I'ma gonna put myself in sleep mode in about 15 minutes [20:53:49] o/ [20:54:04] have a good (and quiet) one y'all [20:54:38] thanks apergos [20:54:53] yw [21:14:37] PROBLEM - HP RAID on ms-be1025 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [21:15:08] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:17:07] RECOVERY - HP RAID on ms-be1025 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [21:31:39] (03PS1) 10Reedy: Remove old variable transfers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313321 (https://phabricator.wikimedia.org/T146945) [21:32:10] hasharAway: ^ just merge t hat [21:33:42] !log Fixed labs 205.21.68.10.in-addr.arpa. entry to remove another broken contintcloud name, unbreaking beta scap [21:33:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:33:53] hasharAway, ^ [21:35:53] thcipriani [21:36:10] Krenair: thank you! [21:37:57] Other deployers can do this stuff too [21:38:17] though it's not exactly required knowledge to get deployment access [21:43:03] [10:39] was reading info about the beta deployment labs cluster , then noticed how the Main Page lists contributions in footer , went to see the contribs of MediaWiki default , and seeing that the contribs lists a CIDR global IP block supposedly matching the non-registered "user". Not to mention also that I do not have a labs account per se yet and viewing the page as [21:43:03] guest, and the block is supposedly around since months (although i get it that labs are for testing and anything happening there on-wikis doesn't directly affect prod as the sites are testing sanbox), that smells like a good amount of checkuser-related bug or bug with Special:Contributions or anything related to that page such as the block logs, however if also considering that [21:43:03] deployment cluster is for testing code changes before deployment to production, that contradicts all of the above . Very funny but also dangerous if that somehow made it's way to production during any prior deployments [21:43:03] [10:40] https://deployment.wikimedia.beta.wmflabs.org/wiki/Special:Contributions/MediaWiki_default [21:45:31] (03CR) 10Hashar: [C: 031] "Too late for me to git pull on the production server :D Thank you Reedy for the patch!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313321 (https://phabricator.wikimedia.org/T146945) (owner: 10Reedy) [21:46:27] arseny92: It's a fake edit done during wiki creation [21:46:30] Nothing to care about [21:47:52] Reedy: hm, the "This IP address is currently globally blocked" notice is definitely wrong though [21:48:07] Reedy, im not about the edit, im about the fact that the gblock log is shown for IP [21:48:15] i don't think this is checkuser-related, "MediaWiki default" definitely did not make that edit from that IP address [21:48:33] Isn't that WMF external ip range? [21:49:23] (globalblocking-contribs-notice: MediaWiki default) [21:54:23] <|L> lol, I blocked that range [21:54:33] <|L> but that blog is old [21:55:03] The same block message for 173.208.12.0/24 is displayed for every block at Special:GlobalBlockList if you try to go to any ip contribs [21:55:32] <|L> arseny92: Not anymore :P [21:55:35] <|L> (Global block log); 23:54 . . Luke081515 (talk | contribs | block) removed global block on User:173.208.12.0/24 ‎(outdated) [21:56:38] <|L> have a good night! [21:57:27] |L the message got replaced by a diff block lol [22:18:16] PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:36:00] RECOVERY - puppet last run on aqs1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:47:23] bd808: my inbox :( [22:47:42] Krenair: E_TOOMANYPATCHES? [22:47:46] yes [22:48:09] I tried to keep the changes reviewable [22:48:09] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:50:48] (03CR) 1020after4: [C: 031] phabricator: Reduce innodb_ft_min_token_size from 3 to 1 [puppet] - 10https://gerrit.wikimedia.org/r/313235 (https://phabricator.wikimedia.org/T146673) (owner: 10Paladox) [23:09:49] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 03Fundraising Sprint Rocket Surgery 2016, and 4 others: Banner not showing up on site - https://phabricator.wikimedia.org/T144952#2676013 (10DStrine) [23:13:19] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [23:34:02] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3680903 keys - replication_delay is 0