Airtable - Public Listing Gallery

Filter

Sort

Elliott Thornley

Project Title

Incomplete preferences as solution to the shutdown problem

Advisor Affiliation

Global Priorities Institute, University of Oxford

Project Description

Explore the possibility of using incomplete preferences to solve the shutdown problem.

Here's a document that I wrote for some mentoring I did last year: https://docs.google.com/document/d/1TMY08qzhwOTfXyp04T_2fOmEZK-O9us_xSGkF5OAiO0/edit?usp=sharing. Much the same still applies.

About you

www.elliott-thornley.com

Project Proposal Attachment

Local preference

Project categories

Alignment Research

Participant roles

Taking on one or more of the subprojects listed in this document: https://docs.google.com/document/d/1TMY08qzhwOTfXyp04T_2fOmEZK-O9us_xSGkF5OAiO0/edit?usp=sharing

Mentee must-haves

Either: - grad-student-level proficiency in formal decision theory (e.g. should have a philosophy or economics masters, or be doing a PhD in those disciplines) - good enough understanding of RL to evaluate (and possibly test) my proposed training regime

Candidate nice-to-haves

Mentee selection problem

Briefly present an objection to my proposal (https://docs.google.com/document/d/1BOS6_U4K9lfhBuY96p_4PPgXuh_HdvwIuFB_4QC-DrI/edit?usp=sharing): i.e. make an argument that it wouldn't work. (350 words max)

OR:

Briefly present an objection to my proposed training regime (https://docs.google.com/document/d/1VpLn32sqolhvdGn8B4He4xIcUN2waAnhW5EZNP18Vws/edit?usp=sharing): i.e. make an argument that it wouldn't work. (350 words max)

Participant time commitment (minimum hours/week)

Ideal team size

1-3

Team Status

Closed

Tsvi Benson-Tilsen

Project Title

Conceptual investigation of the core drivers of goal-achieving mental activity

Advisor Affiliation

Machine Intelligence Research Institute

Project Description

The project is to do speculative analytic philosophy to core concepts about agency, mind, and goal-pursuit. We'll bring in criteria (constraints and desiderata) from the nature of agency and from the engineering goal of creating a corrigible strong mind https://arbital.com/p/hard_corrigibility/. We'll look at the demands that these criteria make on our concepts, and find better concepts. See https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html

About you

https://tsvibt.blogspot.com/

Project Proposal Attachment

Local preference

Local (Berkeley CA) would be a plus, but more like a medium than strong preference.

Project categories

Alignment Research

Participant roles

Hopefully they will do their own thinking, and maybe look at philosophical literature if there is any relevant writing.

Mentee must-haves

The project will be centered around doing standard analytic philosophy, but with much more impatience to get to the core of things, and with more willingness to radically deconstruct preconceived ideas to set the stage for creating righter concepts. So a prerequisite is to have already been seriously struggling with philosophical questions around mind and agency. Having struggled with a bit of Quine, Wittgenstein, Anscombe, Fodor, Deacon, Heidegger, Bergson, etc. is some positive indicator. E.g.

Candidate nice-to-haves

Having read some of Yudkowsky's writing about alignment, e.g. Arbital articles. Having read some of my blog posts about AGI alignment.

Mentee selection problem

What do you think of corrigibility https://arbital.com/p/hard_corrigibility/ ? Is it a specific, coherent way a mind can be? If so, what does that look like, exactly? If not, why not?

Participant time commitment (minimum hours/week)

Ideal team size

1 or 2

Team Status

Open

Alert

Lorem ipsum

Okay