Tuesday, May 7, 2013

Factor Analysis Of Ameru Sales Data

My blog has suffered a bit because one of my computers was down (now fixed!) and because I have been spending time with our sales data, exploring it to see what I can learn.

I have long been interested in a multivariate statistical procedure called "Factor Analysis", which takes a bunch of variables and "reduces" them into a fewer number of factors that hopefully will adequately model the data.  Each factor represents bearings that tend to sell together (if a customer buys some of one bearing, they are likely to buy some of one or more others).  Another way to think of a "factor" is as if they are "classes" of bearings, they more-or-less go along in the same basket...

In general, what we seek when using Factor Analysis are clean & descriptive factors that show a high correlation of bearings within that factor group as well as having each factor distinctive from the others.  Correlation is measured by the correlation coefficient ("C.C.", the second column in each table below), here all C.C.'s are positive (but they can and typically do in many datasets range from -1.00 to + 1.00).  In general, in the "social sciences" a C.C. of 0.3 is considered "significant", in that the two variable measured against each other would have a "moderate" relationship.  I am not looking for moderate relationships, however, I am looking for strong ones, and C.C.'s of about 0.7 or so are pretty strong, with 0.8 being even better and 0.9 almost a straight line relationship (charts below will illustrate these C.C.'s).

***

Before I go further into Factor Analysis, here is a brief overlook of our data as well as what correlation coefficients look like!

The below chart shows the sales of our top 248 bearings.  Our sales are "exponentially distributed", much like the famous "80/20 Rule" that you read about (how your top 20% of top pieces represent 80% of sales, etc.).  Actually, our data is more skewed than that, we are about 88/12 or so.  The first is all 248 bearings studied, note how important our top selling piece (KB-6007-RS) is!  The second chart is the same but with the outlier KB-6007-RS piece removed, the second chart looks much more like a classic exponential distribution.



Here are some examples of what different correlation coefficients look like (I made these numbers up and plotted them, but the correlation coefficients are accurate).  Note how close to a straight line 0.95 is, you can probably easily understand why I like to see high C.C.'s in my work!  A set of high C.C.'s will give me an idea of how many pieces to buy of each item in that group (factor), for example if we want to buy 500 pieces of our best-selling Daewoo Tico piece, I can be confident in buying, say, 200 of our next best-seller for Tico.  Imagine that each of the below graphs shows sales of each piece vs. its factor for a one month period:




***

Onward.  Basic info on the programs and dataset:

--  Program: S-Plus v. 6.1.2 (2002)
--  248 bearings (variables), representing some 79% of our sales, (total of +/- 1200 pieces)
--  400 customers (representing some 97% of our sales), total of 700 or so customers
--  Time period: 2011 - April 2013

Procedure:

All data comes from my MS Access analytical database.  That data comes up from Peru on MS Excel in a different format.  I took the Access data and made a "crosstab query", and found that Access limits crosstabs to 250 columns.  OK, then I looked at our top 248 top selling bearings.  I transferred this data to Excel, as S-Plus works with Excel.  After I had S-Plus' version of a worksheet, I was then able to execute the Factor Analysis routine.

Now that you have an idea of our overall sales data and what C.C.'s look like, we return to look at the results of the Factor Analysis I did on our dataset.

Explanation of below results:

-- I chose bearings that loaded highly (high C.C.) with each factor.
-- I chose 16 factors to be extracted.
-- "Bearing Ref" is our internal coding, the first two alphanumerics are the brand.
--  Bearing coding is quite irregular!  It is like having to learn languages...
-- "C.C." is the correlation coefficient of each piece to its Factor group.
-- "Factor" is the factor number, Number 1 is the most important.
-- "Eigenv." means eigenvalue, here equivalent to how important (loosely explained...).
-- "Comments" shows factor description or other info of note
-- "2011-13 Rank" is importance of each piece (sales, see highlighted bearing below).
-- "2013 Rank" is just for 2013 sales, there are changes, so I monitor them.

Factor 1 just below are (almost) all bearings for "Daewoo Tico", a small Korean & very common car in Peru, there are three bearings for the similar "Daewoo Matiz" that uses a few different pieces (here three).  Factor 1 has an eigenvalue of 14.9 (if I had run the routine to include all 248 possible factors, they would add up to 248, but I just chose to look at the 16 most important, so Factor 1 is (crudely) "worth" about 14.9 times an average bearing's sales in this dataset).

Some of these C.C.s are very high, I highlight in light blue background values over 0.95 (rarely seen in the social sciences).  I do not have any other data from other businesses, so I do not have any idea whether or not businesses see such strong relationships among their products...  Readers, care to share comments?

Some of our biggest sellers did not correlate well within just one factor (example: see last chart just above of our IL-212001).  My judgement is that these bearings sold to customers who bought lots of different kinds of bearings...  Many of our customers tend to specialists to one degree of another.  I mark three of these bearings in red in the tables below the term "highest ldg" means that these pieces do not fit well in that factor group, but it is the best fit...  You can note that the C.C.'s are rather low for those three pieces, again, I think it is because customers bought these three pieces with a variety of other bearings, not just for the same brand of bearing, vehicle model or type of vehicle (eg a bus).

This work is unfinished!  Note that I have made no comments because I am still trying to figure out how all of these pieces fit together!

High Factor Loadings -- Ameru Bearings (Data 1 May 2013)

2011-13 2013
Rank Rank
Bearing Ref. C.C. Factor Eigenv. Comments
C13RCT283 0.211 1 14.9 Daewoo Tico 15
C3RCT283SA 0.76 1 14.9 Daewoo Tico 44
KB30204 0.545 1 14.9 Daewoo Matiz 37 28
KB30205 0.518 1 14.9 Daewoo Matiz 21 21
KB32004X 0.824 1 14.9 Daewoo Tico 18 24
KB33005 0.754 1 14.9 Daewoo Tico 12 17
KB6000.2RS 0.964 1 14.9 Daewoo Tico 4 5
KB6002.2RS 0.929 1 14.9 Daewoo Tico 14 18
KB6007C3 0.813 1 14.9 Daewoo Tico
KB6007.2RSC` 0.811 1 14.9 Daewoo Tico 3 16
KB6007.RS 0.954 1 14.9 Daewoo Tico 1 1
KB6204A 0.866 1 14.9 Daewoo Tico
KB6204.2RS 0.981 1 14.9 Daewoo Tico 8 9
KB6205.RS 0.98 1 14.9 Daewoo Tico 7 8
KB6206.20 0.867 1 14.9 Daewoo Tico 30 34
KB6207.HLC 0.778 1 14.9 Daewoo Tico
KB6302.2RS 0.92 1 14.9 Daewoo Tico 11 11
KB68149.11 0.549 1 14.9 Daewoo Matiz 9 6

Below is Factor 2, note that I need to do more work to interpret this one better.  They are all wheel bearings, double-row type and hub & bearing assemblies (latter has a double-row bearing inside a steel housing).

DE255237 0.649 2 14.4 Suzuki
DE27BWK04 0.652 2 14.4
DE27BWK04.2A 0.717 2 14.4 26
DE27BWK06A1 0.616 2 14.4 Daihatsu/Toyota
DE30BWK11 0.584 2 14.4
DE387038 0.705 2 14.4 Daihatsu/Toyota
DE387139 0.568 2 14.4 Toyota Probox 19 19
DEE90A 0.474 2 14.4 Nissan -- highest ldg 2 10
DEHUB042.32.82 0.713 2 14.4 22
IL141008 0.515 2 14.4 Toyota Probox
MB38BWD22 0.671 2 14.4 Toyota Probox 5 4
MB40BWD15 0.653 2 14.4 Nissan
MBMUB0814 0.645 2 14.4 27 23

Factor 3, more work as well...

CH3874 0.925 3 14.2 Toyota
DE27BWK03ABS 0.612 3 14.2
DE2DUF050.7 0.819 3 14.2
DE51KWH01 0.735 3 14.2
DE54KWH02 0.921 3 14.2 Toyota Hi-Ace 25 10
DEHUB042.32 0.745 3 14.2
IL141005 0.941 3 14.2
JV949100333 0.808 3 14.2
JV949100348 0.911 3 14.2
JV949100382 0.885 3 14.2
IL212001 0.463 3 14.2 highest ldg 3 2
KB800736 0.595 3 14.2
KB800738 0.823 3 14.2
KBF569214 0.702 3 14.2

Factor 4 is almost all "tapered roller bearings" ("Timken type"), for wheels of commercial vehicles.  The first is an exception (it is for a truck, but it is a clutch bearing) while the last piece is an automotive wheel bearing that slipped in here...

C458TKA3703 0.5 4 13.8 +…
KB30207 0.758 4 13.8 trbs for Com Vehs +
KB30208 0.603 4 13.8 trbs for Com Vehs +
KB30210 0.722 4 13.8 trbs for Com Vehs +
KB32011X 0.689 4 13.8 trbs for Com Vehs +
KB32207 0.843 4 13.8 trbs for Com Vehs +
KB32209 0.737 4 13.8 trbs for Com Vehs +
KB33012 0.846 4 13.8 trbs for Com Vehs +
KB33013 0.508 4 13.8 trbs for Com Vehs +
KB3780F1.20 0.525 4 13.8 trbs for Com Vehs +
KBHC6307TH 0.803 4 13.8 trbs for Com Vehs +
KO02474.20 0.676 4 13.8 trbs for Com Vehs +
KO28584.21 0.764 4 13.8 trbs for Com Vehs +
KO30210 0.855 4 13.8 trbs for Com Vehs +
KO30211 0.737 4 13.8 trbs for Com Vehs +
KO30212 0.785 4 13.8 trbs for Com Vehs +
KO32207 0.608 4 13.8 trbs for Com Vehs +
KO3579.25 0.711 4 13.8 trbs for Com Vehs +
MBM4074CW 0.779 4 13.8 +…

Factor 5 is double-row bearings, hub & bearing assemblies and three unusual sizes of KBC bearings, the last piece is KBC as well, but is a double-row bearing.  Item IL-111001 is a double-row piece for Hyundai Accent, it is our 6th best seller both for this year 2013 as well as for the longer period 2011 - 2013.

DE28BWK06D1 0.679 5 13.6
DE28BWK15 0.708 5 13.6 40
DE28BWK19 0.863 5 13.6
DE42450.02070 0.555 5 13.6 Factor 6 too.
DE513297 0.519 5 13.6 62
DE515013 0.955 5 13.6
DE498448 0.719 5 13.6
DE559060 0.698 5 13.6
IL111001 0.391 5 13.6 highest ldg 6 6
IL111004 0.544 5 13.6
IL223003 0.688 5 13.6
IL223028 0.859 5 13.6
KB6308.32 0.574 5 13.6
KB806023 0.811 5 13.6
KB806093 0.825 5 13.6
KBDT255237 0.669 5 13.6

Factor 7, bearings for old cars, unusual pieces, etc.:

IL131006 0.592 7 11.6 Peugeot
IL131008 0.797 7 11.6 Peugeot
IL710001 0.683 7 11.6 housing for bearing…
KB576079 0.506 7 11.6 KB oddball
KB62.28.2RS 0.677 7 11.6 KB oddball
KB63.28.2RS 0.639 7 11.6 KB oddball
KO30207 0.684 7 11.6
MBM0895W 0.685 7 11.6 Chrysler
MBSet10 0.571 7 11.6 old car rear wheels
MBSetM80 0.554 7 11.6 old car rear wheels
MBSetM9 0.651 7 11.6 old car rear wheels

Factor 8 is "bread & butter" pieces for Korean cars (wheel bearings).  FYI, "carnecitas" is Peruvian slang for nice selling pieces ("little meaty ones").

IL111002 0.601 8 10.4 K. carnecitas for Korean 54
IL111003 0.876 8 10.4 K. carnecitas for Korean 15 29
IL111004 0.635 8 10.4 K. carnecitas for Korean 63
IL111006 0.829 8 10.4 K. carnecitas for Korean 120
IL111009 0.847 8 10.4 K. carnecitas for Korean 26
IL112001 0.588 8 10.4 K. carnecitas for Korean 94
IL113002 0.91 8 10.4 K. carnecitas for Korean 13
IL113006 0.549 8 10.4 K. carnecitas for Korean 133
IL113015 0.891 8 10.4 K. carnecitas for Korean 17
IL113017 0.917 8 10.4 K. carnecitas for Korean 81
ILH0400.1 0.596 8 10.4 Factor 10 too…
KB11749.10 0.752 8 10.4 K. carnecitas for Korean 82
KB38KW01 0.438 8 10.4 Hyundai, highest ldg 34
KB44649.10 0.512 8 10.4 K. carnecitas for Korean
KB45449/10 0.733 8 10.4 K. carnecitas for Korean 111
KB69349.10 0.647 8 10.4 K. carnecitas for Korean 60

Factor 9 is our cheap large tapered bearings from ZWZ (China), these are trailer bearings mostly.  Note high correlations within this group.  These bearings did not correlate at all with other brands...  This hints at just buying these bearings once in a while from this supplier, only few (4!) other pieces sell well for us.

ZW212049.11 0.924 9 8.7 ZWZ trailer bearings 67
ZW218248.10 0.925 9 8.7 ZWZ trailer bearings 50
ZW32219 0.829 9 8.7 ZWZ trailer bearings 56
ZW33213 0.924 9 8.7 ZWZ trailer bearings 165
ZW518445.10 0.864 9 8.7 ZWZ trailer bearings 38
ZW580.572 0.829 9 8.7 ZWZ truck bearing
ZW594A.592A 0.862 9 8.7 ZWZ truck bearing 140
ZW663.653 0.885 9 8.7 ZWZ trailer bearings 83
ZW683.672 0.779 9 8.7 ZWZ truck bearing 64
ZW749.742 0.568 9 8.7 ZWZ trailer bearings

Other than ZWZ bearings, we are fairly weak in big truck bearings, we have to skip down to Factor 12 to see more.  The items with "truck" in column 5 are either BIG bearings or moderate sized pieces for big trucks (pinion bearing or transmission).

KB598AS.592 0.568 12 6.5 truck bearings…
KBTR6513051 0.424 12 6.5 truck bearings…
KO30309D 0.525 12 6.5 truck bearings…
KO30311D 0.713 12 6.5 truck bearings…
KO32207 0.56 12 6.5
MB88509 0.545 12 6.5
MB8MTH7205R 0.684 12 6.5 truck bearings…
MB309L 0.791 12 6.5 truck bearings…
MBUY1307TM 0.545 12 6.5 truck bearings…
NSB17.99 0.651 12 6.5
ZW749.742 0.473 12 6.5 truck brg, Factor 9

KBC makes a few pieces that few others make, for Korean vehicles, Factor 16 shows this well, note the low eigenvalue of 3.6, this explains relatively little of our overall sales...

KB30306C 0.703 16 3.6 K. oddball trbs for K.
KB30307C 0.702 16 3.6 K. oddball trbs for K.
KB30308D 0.375 16 3.6 K. oddball trbs for K.
KBTR458020 0.628 16 3.6 K. oddball trbs for K.

...

S-Plus is kind enough to advise if the number of factors chosen is a "good model" or not, using a statistical notion of "p-value".  In this case, I need to get a p-value of over about 0.20 or so for the model to be pretty descriptive of the whole dataset (248 bearings).

With the above 16 factors extracted, my p-value was zero.  So, I tried extracting larger numbers of factors, the highest number I tried was 27 factors.  Factor 27 had an eigenvalue of 0.88 or so, less than one, so that looked to be about all of the useful relationships I was likely to find in each of the 248 bearings and their correlation coefficient with each factor.  Even extracting 27 factors, however, the p-value is still zero...  But, the three below factors (numbers 19, 22 and 23) did show some decent correlations among some pieces and their factor.  Note Factor 22 finally broke out the Daewoo Matiz pieces from Factor 1.  Factor 19 shows me that the wheel bearings for older Toyota Hi-Ace (a van used a lot in public transport in Lima) sold similarly in both KBC and Koyo brands.  Factor 23 are two Koyo ("KO" prefix) pieces that are a different type of bearing (belt-tensioner bearings, as in the bearing used in fanbelts for alternators, etc.).

                                    Factor    Eigenv.       Comments

(27 Factors…) 19 3.6 KB & KO Toyota Hi-Ace
(27 Factors…) 22 3.1 Daewoo Matiz
(27 Factors…) 23 2.4 KOPU1262 & PU2462

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.