Explore the possibility of using incomplete preferences to solve the shutdown problem.
Here's a document that I wrote for some mentoring I did last year: https://docs.google.com/document/d/1TMY08qzhwOTfXyp04T_2fOmEZK-O9us_xSGkF5OAiO0/edit?usp=sharing. Much the same still applies.
www.elliott-thornley.com
Taking on one or more of the subprojects listed in this document: https://docs.google.com/document/d/1TMY08qzhwOTfXyp04T_2fOmEZK-O9us_xSGkF5OAiO0/edit?usp=sharing
Briefly present an objection to my proposal (https://docs.google.com/document/d/1BOS6_U4K9lfhBuY96p_4PPgXuh_HdvwIuFB_4QC-DrI/edit?usp=sharing): i.e. make an argument that it wouldn't work. (350 words max)
OR:
Briefly present an objection to my proposed training regime (https://docs.google.com/document/d/1VpLn32sqolhvdGn8B4He4xIcUN2waAnhW5EZNP18Vws/edit?usp=sharing): i.e. make an argument that it wouldn't work. (350 words max)
The project is to do speculative analytic philosophy to core concepts about agency, mind, and goal-pursuit. We'll bring in criteria (constraints and desiderata) from the nature of agency and from the engineering goal of creating a corrigible strong mind https://arbital.com/p/hard_corrigibility/. We'll look at the demands that these criteria make on our concepts, and find better concepts. See https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
https://tsvibt.blogspot.com/
Hopefully they will do their own thinking, and maybe look at philosophical literature if there is any relevant writing.
What do you think of corrigibility https://arbital.com/p/hard_corrigibility/ ? Is it a specific, coherent way a mind can be? If so, what does that look like, exactly? If not, why not?
No attachments