Multi-variable linear regression



Predicting exam score : regression using one input (x)

one-variable or one-feature 

좋은 feature 를 뽑아내야함 수십 수만개의 feature 가 존재할 수있다.


Predicting exam score:

regression using two input(x1,x2)

Hypothesis

H(x) = Wx + b

H(x1,x2) = w1x1 + w2x2 + b


Cost function

이것은 동일하고 Hypothesis 만 변경 시켜주면됌.


 Multi -variable

H(x1,x2,x3,...xn) = w1x1 + w2x2 + w3x3 + ... + wnxn + b



Matrix multiplication

- basic linear algebra


H(X) = WX(행벡터 W, 열벡터 X) + b


W vs X

두개를 similar form 으로 column vector로 define 한다. (mathematical)

W를 Transpose 한다. 

inner product form



  1. Zedd0202 2017.01.22 00:18 신고

    정말 많은 도움이 되었읍니다.... 태어나서 이렇게 도움되는 게시물은 처음입니다... 앞으로도....많은 글...써주시길.....@----

How to minimize cost


Hypothesis and Cost

Simplified hypothesis

H(x) = Wx  ( b=0)


What cost(W) looks like?

 * W=1, cost(W) = 0

 * W=0, cost(W) = 4.67

 * W=2, cost(W) = 4.67

W에 대한 cost(W) 값을 그려보자 .


가운데 값 찾아내는 것!


Gradient descent algorithm

(경사 )      (감소)

* Minimize cost function

* Gradient descent is used many minimization problems

* For a given cost function, cost(W,b), it will find W,b to minimize cost

* It can be applied to more general function : cost(w1,w2,...)

 several parameter also possible.


How it works?

How would you find the lowest point?


Gradient 를 따라서 움직이다  Gradient =  0인 곳에 결국 수렴한다.


 How it works?

* Start with initial guesses

- Start 0,0 (or any other value)

- Keeping changing W and b a little bit to try and reduce cost(W,b)

* Each time you change the parameters, you select the gradient which reduces cost(W,b) the most possible

*미분을 이용하자

* Repeat

* Do so until you converge to a local minimum
* Has an interesting property
-Where you start can determine which minimum you end up


m 이나 2m 이나 minimal 하는 과정에서는 동일.


해당 기울기를 구해서 기울기가 (-)라면 W 를 감소시킨다. vice versa.


Gradient descent algorithm




Convex function


이런 경우에는 수렴하지 않을 가능성이 존재한다. (Not convex function)



Linear regression을 적용하기 위해서는 cost function 이 convex function 인지를 확인해줘야 한다. 

Predicting exam score : regression

x(hours)    y(score)

training data set. 을 통해서 learning.

labeled data. y is range. 


After learning , put x . Then get y.


(Linear) Hypothesis 는 어떤 1차원의 선이 존재할 것이다. 라고 가정하는것.

H(x) = Wx + b


어떤 선이 우리가 찾던 선인가.?

Which hypothesis is better ? 

hypothesis 와 data point 와 비교한다 ( 거리를 측정 =cost function,loss function)


Cost function

 * How fit the line to our (training) data

H(x) - y    ->not good.

(H(x) - y)^2 is formal.

m is number of data.

Learning of linear regression is minimizing cost.



Goal : Minimize cost

minimize cost(W,b)

get the W, b.



Basic concepts

* What is ML?

* what is learning?

-supervised

-unsupervised

* what is regression?

* what is classification?


Matching Learning

* Limitations of explicit programming

(개발자가 이런환경에서는 이렇게 작동하라. 라고 explicit 하게 한 경우)

(spam필터 기능들은 explicit 하지 않으므로 동작하기 어렵다.)

 - spam filter : many rules

 - Automatic driving : too many rules


* Machine learning : "Field of study that gives computers the ability to learn without being explicitly programmed" Arthur Samuel(1959)


Supervised / Unsupervised learning

* Supervised learning : 

- learning with labeled examples = training set.

an example training set for four visual categories.

cat , dog , mug , hat.

learning data that labeled like cat. 


* Unsupervised learning ; un-labeled data

- Google news grouping

- Word clustering


 Supervised learning

 * Most common problem type in ML

- Image labeling : learning form tagged images

- Email spam filter : learning from labeled ( spam or ham) email

- predicting exam score : learning from previous exam score and time spent


Training data set

label 된 data를 가지고 학습을함. 모델을 생성,

x 라는 data가 주어졌을 때 y 라는 값을 출력.

Training data set is essential.


AlphaGo

learning Go data.


Types of supervised learning

 * Predicting final exam score based on time spent

- regressions

wide range.

 * Pass/non-pass based on time spent   

- binary classification

 * Letter grade(A,B,C,E and F) based on time spent

- multi-label classification


Predicting final exam score based on time spent

x(hours)    y(score)

- regressions

Pass/non-pass based on time spent

- binary classification


Audience

* Want to understand basic machine learning(ML)

* No/weak math/computer science background

- y = Wx + b(y = ax+b) 이 정도의 수준으로 가능.

* Want to use ML as black-box with basic understanding

* Want to use Tensorflow and Python(optional)


Goals

Basic understanding of machine learning algorithms

 * Linear regression, Lgistic regression ( classification)

 * Neural networks, Convolutional Neural Network,Recurrent Neural Network


solve your problems using machine learning tools

 * Tensorflow and Python


Course structure

 * About 10 min Lecu


인하대학교 이필규 교수님이 추천해주신 논문을 읽어보도록 하겠습니다.

http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html


Human-level control through deep reinforcement learning 

이라는 제목의 논문으로 nature 입니다.

The theory of reinforcement learning provides a normative account , deeply rooted in psychological and neuroscientific perspectives on animal behavior, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity,agents are confronted with a difficult task : they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably , humans and other animals seem to solve this problem through a harmonious combinations of reinforcement learning and hirerarchical sensory processing systems , the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopa- minergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains , their applicability has previously been limited to domains in which useful features can be handcrafted or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using ends-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. we demonstrate that the deep Q-network agent,receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyper parameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first articial agent that is capable of learning to excel at a diverse array of challenging tasks. 


We set out to create a single algorithm that would be able to develop a wide range of competencies on a varied range of challenging tasks- a central goal of general artificial intelligence that has eluded previous efforts.To achieve this, we developed a novel agent, a deep Q-network(DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks. Notably recent advances in deep neural networks , in which several layers of nodes are used to build up progressively more abstract representations of the data, have made it possible for artificial neural networks to learn concepts such as object categories directly from raw sensory data. We use one particularly successful architecture, the deep convolutional network , which uses hierarchical layers of tiled convolutional network, which uses hierarchical layers of tiled convolutional filters to mimic the effects of receptive fields - inspired by Hubel and diesel's seminal work on feedforward processingg in early visual cortex- thereby exploiting the spatial correlations present in images, and building in robustness to natural transformations such as changes of viewpoint or scale.


We consider tasks in which the agent interacts with an environment through a sequence of observations, actions and rewards. The goal of the agent is to select actions is to select actions in a fashion that maximizes cumulative future reward. More formally, we use a deep convolutional neural network to approximate the optimal action-value function.



which is the maximum sum of rewards rt discounted by c at each time- step t, achievable by a behaviour policy p 5 P(ajs), after making an observation (s) and taking an action (a) (see Methods)19.

Reinforcement learning is known to be unstable or even to diverge when a nonlinear function approximate such as a neural network is used to represent the action-value function This instability has several causes: the correlations present in the sequence of observations, the fact that small update to Q may significantly change the policy and therefore change the data distribution, and the correlations between the action-values(Q) and the target values   .

We address these instabilities with a novel variant of Q-learning, which uses two key ideas. First, we used a biologically inspired mechanism termed experience replay that randomizes over the data, thereby removing correlations in the observation sequence and smoothing over changes in the data distribution(see below for details).Second we used an iterative update that adjusts the action-values(Q) towards target values that are only periodically updated, thereby reducing correlations with the target.


while other stable methods exist for training neural networks in the reinforcement learning setting, such as neural fitted Q-iteration, these methods involve the repeated training of networks de novo on hundreds of iterations.Consequently, these methods, unlike our algorithm, are too inefficient to be used successfully with large neural networks. We parameterize an approximate value function


While other stable methods exist for training neural networks in the reinforcement learning setting, such as neural fitted Q-iteration24, these methods involve the repeated training of networks de novo on hundreds of iterations. Consequently, these methods, unlike our algorithm, are too inefficient to be used successfully with large neural networks. We parameterize an approximate value function Q(s,a;hi) using the deep convolutional neural network shown in Fig. 1, in which hi are the param- eters (that is, weights) of the Q-network at iteration i. To perform experience replay we store the agent’s experiences et 5 (st,at,rt,st 1 1) at each time-step t in a data set Dt 5 {e1,...,et}. During learning, we apply Q-learning updates, on samples (or minibatches) of experience (s,a,r,s9) , U(D), drawn uniformly at random from the pool of stored samples. The Q-learning update at iteration i uses the following loss function: 


맥 os에 파이참을 설치하는 방법입니다.

윈도우나 환경에서도 해봤지만 맥환경이 더 좋은듯 해서 옮기려 합니다.


윈도우환경도 설치방법은 크게 다르지 않으니 참고하셔도 좋습니다.


파이참(pycharm) 을 설치하는 이유는 별도의 IDE 환경 없이 파이썬을 코딩하려면 번거롭기 때문입니다.

java -> eclipse

python -> pycharm  이라고 생각하시면 편합니다.


https://www.jetbrains.com/pycharm/?fromMenu 

사이트에 접속해서 os 환경에 맞는 파이참을 설치하면됩니다.

community 버전이 무료이니 community 버전을 클릭합니다.


나머지는 해당 절차에 맞게 설치하시면됩니다.



자 그럼 pycharm 으로 본격적인 머신러닝을 진행해봅시다!

  1. 자손9319 2017.01.20 04:47 신고

    고생하셨어요 ㅋㅋㅋ

+ Recent posts