


Statistics notes







B A P pros -accurate, -unbiased SAMPLING sample= selection from entive pop. pros -practical -less data quick, easy, cheaper can be unrepresentative -Census = survey of entire pop ceasy for small Populations) representative sample frame = list / map of STRATIFIED SAMOLE cuseful when diff gr. likely to give diff. answers) - don't stratity by vanable being investigated E QUOTA SAMPLE (fixed amt from each group chosen) SYSTEMATIC SAMPLE (choose item at reg. intervals) -useful for large pop CLUSTER SAMPLE (pop. divided into. clusters & gr. chosen Cut random to sample) -Closer distribution of members (within chester) are to whou pop, the Less was there is - usefull for Specific R OPPORTUNITY SAMPLE (sample of ppl/members. available at time I place) A.K.A Convenience sample JUDGEMENT SAMPLE cresearcher uses own. judgement to choose sample) obscure! investigation pop. is where sample is chosen from Sampling unit = ppi to be sampled biased sample = doesn't represent population fairly. (savoided by random sample & bigger sample -variability between sampus (will not give exact. same data) RANDOM SAMPLE ceach member equally. likely to be chosen). 2 cons biased mare sused to conlusions for whowe POP. surveyed. cons - lots of data -time-consuming access?-impractical? -expensive Pros representative, unbiased Cans: sample frame, time-consuming, need large pop, not always Convenient Prosi representative, compare results diff. groups, pop. w/ differ gr Sizes of Cons: time-consuming, not useful for hard to define groups. Pros: quicks, cheap, no sample frame Cons: biased, not random, unrepresentative Pros: done by machine, sample easy Cons: to select, evenly sampled. every 11th item may coincide with a pattern (biased) not strictly random, unrepresentative Tots Pros: cheaper, representative if som small clusters sampled, convenient. Cons: can be unrepresentative (biased), high sampling error. of Pros: Quick, easy / cheap, convenient, no sample frame. cons: not random, unrepresentative, biased. Pros: easy, quick, may be only suitable method Cons:...

Alternative transcript:

not random cbiased), quality depends on researche's judgement I knowledge! reliability. CLEANING DATA Problemsi -outliers (include=distort/skew, ignore = inaccurate conclusion? -missing dat a values. wrong format forder -diff. symboisiunits Clean data: Problems by: -correct, remove outliers, missing data, inacc or record data aga again. remove symbols innits Put dara in same - format (micm, celuC, words 'Letters) Simplifying data =) easier to spot overall trends, but masks some trends i less detail es combine / total categories grouping data =) when there's lots of data or it's Spread-marses it easier to spot trends/distribution - easier to read -Spot Patterns& compare classes Sno gr. should overlap. 2> Class intervals, discrete Cons: -Cales are only estimates aliagrams can be misleading - looses accuracy of exact vallas (> use smaller - use myshen w/gaps Continuous - use inequalities w/no gap: CI for data that's close of larger Ca for data spread out. c>class limits = UB & LB () (w = size of class: discrete- LB of next OF class Continuous - UB. -y-axis = age gr., Jc-axis = pop. ;. PYRAMID SMAPE. BARREL SHAPE Ge younger solder is younger = dider uracies. STEM & LEPI -quantative data & Shows distribution Back to Backi to compare z data sets cuse 2 diff. Keys).. high birth/death rate 4> Short life expectancy - POPULATION PYRAMIDS - Show distribution of the ages of 2 gris within population percents) Cnumbers or gender or country. INVERTED PYRAMZO SHAPE в) устиндеr colder Ladeclining birth/ death Lylon birth/death rate Calong life expectancy rate Laincreasing life expectancy FREQUENCY POLYGON drawn from continuous data that's grouped x-axis =mp, y-axis = frequency LB, of CUMULATIVE FREQUENCY f (f = running total Cf ≤ to of value) CF Step Polygoni -nonizontal lines between pts. -discrete dara - join pt. by going -> then ↑ -height of each step= frequency CF Polygoni -grouped, continuous data - plot using UB against CF smooth curve or Straight line Start Cf at zero Estimating values: -n/2-> median (50%) -nlu 3n14-3 LB, UB (257) (75%) -more than -> subtract from total - Percentile -> calc thes from total. APPROPRIATE REPRESENTATION. Line graph: quantitative data; show trends over time. bar Chart: discrete, qualitative data, snows amts. Sovisually appealing, shows mode Pie Chart : single variable data, any type of data, Proportion comparisons, visually appealing distribution comparisons. OF JOR, Skew, f Polygon: grouped data box Plot5. St. Polygoni continuous, grouped dara Step of polygon: discrete data shows striation & Frequencies histogram: gea continuous grouped data; lots of dato Scatter graph: bivariate data stem & leaf : discrete data; less data'; median Misleading diagrams. > Pictograms: no key, squashed symbols, diff. symbors / Bizes. () colour: brighter = stands out 2 volume: 3D mares i section appear bigger esgraph: no labels on ares, ax is unevenly scaled, truncated y-axis, 30-effect angued Ccan make line seem steeper). SKEWNESS Ros positive Skew; mean >median >mode median smean Negative Skew: meancme Symmetrical (no skew) mode Fredian mean Cualues ba be median greater spread! mean median = mode =median median Cualues above median have a greater spread) -median <mode Mean mydian. Ausmode n mean SKEN FORMULA Shrew = -between 3&-3 (+) value = Positive skew - (-) value = negative Skew -closer to IS 3 (mean-med) MEASURES OF AUG avg = measure of central tendency mode-highest valve, modal class class median-> middle vale CII median mean = Stronger Shrew Start + w/ highest frequency (n+1)/2 SO mean-> (akafarithmetic mean) Excin weighted mean-> geometric mean-> Σ fac / Ef = { w x V IEW amt. in total category. => discrete n/2 => continuous grouped x cw V v₁ x V ₂ x V q xnx vn Linear transformation -> & a value from all all; Calc. Augi => med vaie Changes to data AUG. mode-> add/remove data => Changes mode 1bimodal only if it changes which values appear most. median -> add a greater vame, remove smallere new add a smaller value, remore greater => med + add/remove one greater & one smaller => no change mean-sadd greater val. ( cemore smaller val. => add smaller val. iremore greater val. => mean & values =Xchange mean A replace Choosing Appropriate AVG? mode zae, a, az advantage -easy to use -always a data vall. unaffected by extreme vals. -quantitative & qualitative data easy to find in ordered dara -un affected by extreme vales best for Sirerved data. need to call. Skeri -Uses - usually most. - used to carc. Skew & SD. au data representative smp. * a value from $&E reverse. T disadvantage -may not be a - modul car more than 1) not always representative can't cale, spread reCAL may not be dată val! - not always a. representative - may not be a dată value -anway's affected by outliers can be di startedl MEASURES OF SPREAD for dispersions. smallest val. Range - Spread; largest val. - IQR-> Middle sor of data ; va-La LIQ -> /n 25% of data UQ -> 3/u 75%, of data 1/4 (n+1)in val. 3/u (n + 1)m val. lunth UQ = 3/4 nth IPR - interpercentile range; difference between 2 percentiles. SD → <> discrete: co groupedica = (spercentiles =) divide data into 100 parts. IPR from atable. • IPR = larger percentile IDR ->interdecile range & - mean -median + COMPARING DATA SETS deciles cusually 1st & am) - > decices => divide data into 10 parts • IDR = 9th decile 1 st decile measure of how far vals are or how spread • smaller SD = > closer the • larger SD => far from 2> Use Use L.J smaller mode STANDARD ISED SCORE Used to for above I below. is val. val for grouped Standardised score SD or range IQR or range or trange percentile difference compare a samples of individual = + val, =) above m - val. =) below M ⇒ val. = № it gives no. of SDs away. ×100 data is to score vanes datato T between 2 from mean da ta SD IPR or IDR M are see how from mean mean CHAIN BASED INDEX -compares prices of each yr with that of pren (from yr to yr) an C.B.I Price prev. yr SRCC -measures -between-1 & 1 Closer to √g = 1 -> Strong positive (perfect). VS == 1 -> strong negative (perfect) 6ęd? rs = 1 - n (n²-1) 6 agreement between ranks. - Livear & non-linear PMCC measures Strength of linear correlation 2 variables between -1 & 1 2) correlation between 2 variables. 0= no comelation for seasmal trend cont overall trend cont sumption: SRCC & PMCC REAALI ESPett ranked data 4 -curved - РМСС closer то чего - SRCC closer to $1 -Straight (more linear) still strong. Pmce closer to #1 JRCC TIME SERIES - live graph w/ time used to spot trends. on x for non-linear X bc-axis trend lines: - snow geveral trend. -ignore functuations & follow general trend ·show & upwards crising) or (falling or constant trend -Use LOBE 2on't join ketst. moving avg. si use COBE (don't jain points) trend line more accurate 4-point -> u quart - smooth out fluctuations (Avg. seasonal effect for EMSV) an -predicted val between -= trend line val + EMSU downwards. vandim. isgenera frend Ctrend line) (seasonal varian in coattern repeats) (seasonal variation 4= actual val. - trend live val. PROBABILITY •pcevent) = no. of outcomes of event total noi of outcomes. Probabilities add to 1 RELATIVE FREQUENCY trials = more accurate relative frequency = prevent) & no. of trials. -helps Spot bias. SAMPLE SPACE DIAGRAM -sample space = list of all poss (in a tables CONDITIONAL PROBABILITY P(BIA) -When events ADDITION LAN • mutually exclusive: P(AUB) = P(A+P (8) •non-muutually exclusive: P(AUB) + P(ANB) = P(A) + P(B) one must happen. -exhaustive events = at least Cincludes all poss, events) INDEPENDENT EVENTS · have no effect PC An (s) = P (A) * P (BS) P(AIB) P (EATE). BIA P (A) = P(B) = P(ANG) P (A) BINOMIAL DISTR -has only 2 4 conditions:// 1- fixed no. of trials 2- each trial has 2 outcomes. (S or F) Notation: хав си, p) STRIBUTION poss, outcomes. 3- all trials are independent. 4- PCS) is constant. ncr panor PCrRss #heana at most) = plat least) = on mean of B (n, p) = nxp ? S na no of trials. P = P(S) out commes eachother or F 9 = P (F) r = no. of successful triau wanted less than or equal to greater than or equal to.