dealing with unbalanced data for classification?
Anonymous
For Data perspective, Oversampling and Undersampling are the techniques which could be used. If the major class has a lot of data ( say 10 million samples) then undersampling could be used. But generally that poses a risk of losing information. Therefore it is preferable to use oversampling algos like SMOTE which helps in increasing samples of minor class. From Algorithm perspective one should refrain using Random Forest and Neural Net techniques and should stick to techniques like SVM. If data is extremely unbalanced with class ratio of say 1:100, choose anomaly detection techniques like one class SVM.
Check out your Company Bowl for anonymous work chats.