I have long been interested in a multivariate statistical procedure called "Factor Analysis", which takes a bunch of variables and "reduces" them into a fewer number of factors that hopefully will adequately model the data. Each factor represents bearings that tend to sell together (if a customer buys some of one bearing, they are likely to buy some of one or more others). Another way to think of a "factor" is as if they are "classes" of bearings, they more-or-less go along in the same basket...
In general, what we seek when using Factor Analysis are clean & descriptive factors that show a high correlation of bearings within that factor group as well as having each factor distinctive from the others. Correlation is measured by the correlation coefficient ("C.C.", the second column in each table below), here all C.C.'s are positive (but they can and typically do in many datasets range from -1.00 to + 1.00). In general, in the "social sciences" a C.C. of 0.3 is considered "significant", in that the two variable measured against each other would have a "moderate" relationship. I am not looking for moderate relationships, however, I am looking for strong ones, and C.C.'s of about 0.7 or so are pretty strong, with 0.8 being even better and 0.9 almost a straight line relationship (charts below will illustrate these C.C.'s).
Before I go further into Factor Analysis, here is a brief overlook of our data as well as what correlation coefficients look like!
The below chart shows the sales of our top 248 bearings. Our sales are "exponentially distributed", much like the famous "80/20 Rule" that you read about (how your top 20% of top pieces represent 80% of sales, etc.). Actually, our data is more skewed than that, we are about 88/12 or so. The first is all 248 bearings studied, note how important our top selling piece (KB-6007-RS) is! The second chart is the same but with the outlier KB-6007-RS piece removed, the second chart looks much more like a classic exponential distribution.
Here are some examples of what different correlation coefficients look like (I made these numbers up and plotted them, but the correlation coefficients are accurate). Note how close to a straight line 0.95 is, you can probably easily understand why I like to see high C.C.'s in my work! A set of high C.C.'s will give me an idea of how many pieces to buy of each item in that group (factor), for example if we want to buy 500 pieces of our best-selling Daewoo Tico piece, I can be confident in buying, say, 200 of our next best-seller for Tico. Imagine that each of the below graphs shows sales of each piece vs. its factor for a one month period:
Onward. Basic info on the programs and dataset:
-- Program: S-Plus v. 6.1.2 (2002)
-- 248 bearings (variables), representing some 79% of our sales, (total of +/- 1200 pieces)
-- 400 customers (representing some 97% of our sales), total of 700 or so customers
-- Time period: 2011 - April 2013
All data comes from my MS Access analytical database. That data comes up from Peru on MS Excel in a different format. I took the Access data and made a "crosstab query", and found that Access limits crosstabs to 250 columns. OK, then I looked at our top 248 top selling bearings. I transferred this data to Excel, as S-Plus works with Excel. After I had S-Plus' version of a worksheet, I was then able to execute the Factor Analysis routine.
Now that you have an idea of our overall sales data and what C.C.'s look like, we return to look at the results of the Factor Analysis I did on our dataset.
Explanation of below results:
-- I chose bearings that loaded highly (high C.C.) with each factor.
-- I chose 16 factors to be extracted.
-- "Bearing Ref" is our internal coding, the first two alphanumerics are the brand.
-- Bearing coding is quite irregular! It is like having to learn languages...
-- "C.C." is the correlation coefficient of each piece to its Factor group.
-- "Factor" is the factor number, Number 1 is the most important.
-- "Eigenv." means eigenvalue, here equivalent to how important (loosely explained...).
-- "Comments" shows factor description or other info of note
-- "2011-13 Rank" is importance of each piece (sales, see highlighted bearing below).
-- "2013 Rank" is just for 2013 sales, there are changes, so I monitor them.
Factor 1 just below are (almost) all bearings for "Daewoo Tico", a small Korean & very common car in Peru, there are three bearings for the similar "Daewoo Matiz" that uses a few different pieces (here three). Factor 1 has an eigenvalue of 14.9 (if I had run the routine to include all 248 possible factors, they would add up to 248, but I just chose to look at the 16 most important, so Factor 1 is (crudely) "worth" about 14.9 times an average bearing's sales in this dataset).
Some of these C.C.s are very high, I highlight in light blue background values over 0.95 (rarely seen in the social sciences). I do not have any other data from other businesses, so I do not have any idea whether or not businesses see such strong relationships among their products... Readers, care to share comments?
Some of our biggest sellers did not correlate well within just one factor (example: see last chart just above of our IL-212001). My judgement is that these bearings sold to customers who bought lots of different kinds of bearings... Many of our customers tend to specialists to one degree of another. I mark three of these bearings in red in the tables below the term "highest ldg" means that these pieces do not fit well in that factor group, but it is the best fit... You can note that the C.C.'s are rather low for those three pieces, again, I think it is because customers bought these three pieces with a variety of other bearings, not just for the same brand of bearing, vehicle model or type of vehicle (eg a bus).
This work is unfinished! Note that I have made no comments because I am still trying to figure out how all of these pieces fit together!
High Factor Loadings -- Ameru Bearings (Data 1 May 2013)
Below is Factor 2, note that I need to do more work to interpret this one better. They are all wheel bearings, double-row type and hub & bearing assemblies (latter has a double-row bearing inside a steel housing).
|DEE90A||0.474||2||14.4||Nissan -- highest ldg||2||10|
Factor 3, more work as well...
Factor 4 is almost all "tapered roller bearings" ("Timken type"), for wheels of commercial vehicles. The first is an exception (it is for a truck, but it is a clutch bearing) while the last piece is an automotive wheel bearing that slipped in here...
|KB30207||0.758||4||13.8||trbs for Com Vehs +|
|KB30208||0.603||4||13.8||trbs for Com Vehs +|
|KB30210||0.722||4||13.8||trbs for Com Vehs +|
|KB32011X||0.689||4||13.8||trbs for Com Vehs +|
|KB32207||0.843||4||13.8||trbs for Com Vehs +|
|KB32209||0.737||4||13.8||trbs for Com Vehs +|
|KB33012||0.846||4||13.8||trbs for Com Vehs +|
|KB33013||0.508||4||13.8||trbs for Com Vehs +|
|KB3780F1.20||0.525||4||13.8||trbs for Com Vehs +|
|KBHC6307TH||0.803||4||13.8||trbs for Com Vehs +|
|KO02474.20||0.676||4||13.8||trbs for Com Vehs +|
|KO28584.21||0.764||4||13.8||trbs for Com Vehs +|
|KO30210||0.855||4||13.8||trbs for Com Vehs +|
|KO30211||0.737||4||13.8||trbs for Com Vehs +|
|KO30212||0.785||4||13.8||trbs for Com Vehs +|
|KO32207||0.608||4||13.8||trbs for Com Vehs +|
|KO3579.25||0.711||4||13.8||trbs for Com Vehs +|
Factor 5 is double-row bearings, hub & bearing assemblies and three unusual sizes of KBC bearings, the last piece is KBC as well, but is a double-row bearing. Item IL-111001 is a double-row piece for Hyundai Accent, it is our 6th best seller both for this year 2013 as well as for the longer period 2011 - 2013.
|DE42450.02070||0.555||5||13.6||Factor 6 too.|
Factor 7, bearings for old cars, unusual pieces, etc.:
|IL710001||0.683||7||11.6||housing for bearing…|
|MBSet10||0.571||7||11.6||old car rear wheels|
|MBSetM80||0.554||7||11.6||old car rear wheels|
|MBSetM9||0.651||7||11.6||old car rear wheels|
Factor 8 is "bread & butter" pieces for Korean cars (wheel bearings). FYI, "carnecitas" is Peruvian slang for nice selling pieces ("little meaty ones").
|IL111002||0.601||8||10.4||K. carnecitas for Korean||54|
|IL111003||0.876||8||10.4||K. carnecitas for Korean||15||29|
|IL111004||0.635||8||10.4||K. carnecitas for Korean||63|
|IL111006||0.829||8||10.4||K. carnecitas for Korean||120|
|IL111009||0.847||8||10.4||K. carnecitas for Korean||26|
|IL112001||0.588||8||10.4||K. carnecitas for Korean||94|
|IL113002||0.91||8||10.4||K. carnecitas for Korean||13|
|IL113006||0.549||8||10.4||K. carnecitas for Korean||133|
|IL113015||0.891||8||10.4||K. carnecitas for Korean||17|
|IL113017||0.917||8||10.4||K. carnecitas for Korean||81|
|ILH0400.1||0.596||8||10.4||Factor 10 too…|
|KB11749.10||0.752||8||10.4||K. carnecitas for Korean||82|
|KB38KW01||0.438||8||10.4||Hyundai, highest ldg||34|
|KB44649.10||0.512||8||10.4||K. carnecitas for Korean|
|KB45449/10||0.733||8||10.4||K. carnecitas for Korean||111|
|KB69349.10||0.647||8||10.4||K. carnecitas for Korean||60|
Factor 9 is our cheap large tapered bearings from ZWZ (China), these are trailer bearings mostly. Note high correlations within this group. These bearings did not correlate at all with other brands... This hints at just buying these bearings once in a while from this supplier, only few (4!) other pieces sell well for us.
|ZW212049.11||0.924||9||8.7||ZWZ trailer bearings||67|
|ZW218248.10||0.925||9||8.7||ZWZ trailer bearings||50|
|ZW32219||0.829||9||8.7||ZWZ trailer bearings||56|
|ZW33213||0.924||9||8.7||ZWZ trailer bearings||165|
|ZW518445.10||0.864||9||8.7||ZWZ trailer bearings||38|
|ZW580.572||0.829||9||8.7||ZWZ truck bearing|
|ZW594A.592A||0.862||9||8.7||ZWZ truck bearing||140|
|ZW663.653||0.885||9||8.7||ZWZ trailer bearings||83|
|ZW683.672||0.779||9||8.7||ZWZ truck bearing||64|
|ZW749.742||0.568||9||8.7||ZWZ trailer bearings|
Other than ZWZ bearings, we are fairly weak in big truck bearings, we have to skip down to Factor 12 to see more. The items with "truck" in column 5 are either BIG bearings or moderate sized pieces for big trucks (pinion bearing or transmission).
|ZW749.742||0.473||12||6.5||truck brg, Factor 9|
KBC makes a few pieces that few others make, for Korean vehicles, Factor 16 shows this well, note the low eigenvalue of 3.6, this explains relatively little of our overall sales...
|KB30306C||0.703||16||3.6||K. oddball trbs for K.|
|KB30307C||0.702||16||3.6||K. oddball trbs for K.|
|KB30308D||0.375||16||3.6||K. oddball trbs for K.|
|KBTR458020||0.628||16||3.6||K. oddball trbs for K.|
S-Plus is kind enough to advise if the number of factors chosen is a "good model" or not, using a statistical notion of "p-value". In this case, I need to get a p-value of over about 0.20 or so for the model to be pretty descriptive of the whole dataset (248 bearings).
With the above 16 factors extracted, my p-value was zero. So, I tried extracting larger numbers of factors, the highest number I tried was 27 factors. Factor 27 had an eigenvalue of 0.88 or so, less than one, so that looked to be about all of the useful relationships I was likely to find in each of the 248 bearings and their correlation coefficient with each factor. Even extracting 27 factors, however, the p-value is still zero... But, the three below factors (numbers 19, 22 and 23) did show some decent correlations among some pieces and their factor. Note Factor 22 finally broke out the Daewoo Matiz pieces from Factor 1. Factor 19 shows me that the wheel bearings for older Toyota Hi-Ace (a van used a lot in public transport in Lima) sold similarly in both KBC and Koyo brands. Factor 23 are two Koyo ("KO" prefix) pieces that are a different type of bearing (belt-tensioner bearings, as in the bearing used in fanbelts for alternators, etc.).
Factor Eigenv. Comments
|(27 Factors…)||19||3.6||KB & KO Toyota Hi-Ace|
|(27 Factors…)||22||3.1||Daewoo Matiz|
|(27 Factors…)||23||2.4||KOPU1262 & PU2462|