Meet the troublemaker of the category.
This one learns not by being informed solutions — however by attempting, failing, and getting rewards or penalties.
Like a online game participant determining the best way to win by dropping 100 instances first.
A robotic learns to stroll. It stumbles, falls, adjusts, and ultimately begins moonwalking like MJ.
Every good transfer will get +1 level. Every fall will get -5. Over time, it optimizes for top scores.
- Sport enjoying (Chess, Go, Atari)
- Robotics
- Self-driving automobiles
- Inventory buying and selling bots
- Agent (the learner)
- Setting (the place it operates)
- Actions, States, Rewards
Consider a toddler studying to not contact the recent range. After a couple of ouches 🔥, they be taught what not to do.