Scaling for various subjects (1 Viewer)

Xanthi · Sep 4, 2020

Here we consider the raw to scaled mark conversion (here I call it scaling for convenience, although not technically correct). All other factors e.g aligning, moderation etc. will not be considered as it is only the raw to scaled mark conversion that gives insight into how hard it is to achieve a certain scaled score. This table should aid in subject selection* and quantifying ATAR goals (use an aggreggate table). You should check against the 2019 papers of each subject for the table below.

The factors that affect scaling include: difficulty of the exam or subject (a more difficult exam will increase raw marks more), to some extent the strength of the cohort in other subjects (a stronger cohort in other subjects generally results in slightly better scaling)**.

Subject	Raw mark (on 2019 exam paper)	Scaled mark	ATAR equivalent	Percentage Change between raw and Scaled mark (near 90% scaled)
English Advanced (New)	70	62	79.00
	83	80	94.95
	87	86	98.05
	90	90	99.45	0%
	92	93	99.85
Chemistry (New)	82	85	97.60
	86	88	98.90
	90	92	99.80	+2%
	99	98	99.95
Economics	83	82	96.10
	86	85	97.60
	89	88	98.90	-1%
	96	96	99.95
Physics (New)	76	84	97.05
	83	89	99.20	+7%
Biology (New)	74	85	97.60
	79	90	99.45	+14%
	84	92	99.80
Buisness Studies	84	79	94.30
	86	81	95.60
	90	87	98.50	-3%
Math Advanced	85	83	96.50
	96	94	99.95	-2%
Math Extension 1	81	88	98.90
	87	90	99.45	+3%
	91	92	99.80
Math Extension 2	79	92	99.80	+16%
Software Design & Development	82	79	94.30
	94	96	99.95	+2%
Modern History	80	84	97.05
	92	92	99.80	0%
Legal Studies	92	86	98.05	-7%
	96	95	99.95

From this data, perhaps unexpected trends emerge.

The most evident of which is the surprisingly high scaling of biology, and low scaling of math extension 1.

The scaling of biology is almost as good as math extension 2.

The historical data for math courses are however not as relevant as large changes in this conversion factor could be expected (better if the new content is examined to a more challenging extent). The new science syllabi scaling is likely to be volatile. The scaling of chemistry and physics may improve this year (in line with pre-2019 levels) which would manifest as a more challenging paper. Other subjects should be more or less stable.

*All other factors being equal!

**In an ideal world, cohort strength would not affect scaling, and only the relative difficulty of attaining certain raw marks in different papers would be the determining factor.

quickoats · Sep 4, 2020

Not sure where you’re getting this data from? or exactly what you’re trying to refer to.

The “raw” HSC marks earned in the actual exam to posted HSC marks is the alignment process (people like to call this scaling) - these are freely available from rawmarks.info

jasminerulez · Sep 4, 2020

This might sound dumb but is there a difference between scaled mark and aligned mark?? If not why does a 70 in English get scaled to a 62, like on rawmark.info it showed that a raw mark of 58.5 got scaled to 75 for English Advanced.

I'm sooooo confused rn

Xanthi · Sep 4, 2020

quickoats said:
Not sure where you’re getting this data from? or exactly what you’re trying to refer to.

The “raw” HSC marks earned in the actual exam to posted HSC marks is the alignment process (people like to call this scaling) - these are freely available from rawmarks.info

These are raw to scaled marks (data obtained from rawmarks.info, HSC marks are then back-scaled to the scaled marks using HSC ninja). Just did it out of curiosity and its a great reference to have when comparing the subjects.

Xanthi · Sep 4, 2020

jasminerulez said:
This might sound dumb but is there a difference between scaled mark and aligned mark?? If not why does a 70 in English get scaled to a 62, like on rawmark.info it showed that a raw mark of 58.5 got scaled to 75 for English Advanced.

I'm sooooo confused rn

The website only shows the aligning process.

The actual scaling process occurs to the aligned marks (accurate to 2dp)

quickoats · Sep 4, 2020

Xanthi said:
These are raw to scaled marks (data obtained from rawmarks.info, HSC marks are then back-scaled to the scaled marks using HSC ninja). Just did it out of curiosity and its a great reference to have when comparing the subjects.

I see what you tried to do and what data you tried to match up, but this is a misleading resource.

Re: an English advanced entry on your table,
70 raw HSC -> 82 aligned -> 31.9 per unit -> 62 aggregate for the subject is correct, but this 62 does not indicate a 62 ATAR.

the 0-50 mark you see on HSCninja is the aggregate contribution. Since students generally perform on a curve your conversion amplified the differences between subjects dramatically which can skew someone’s interpretation.
e.g. 70-80-90 on your table seem like very considerable jumps but in terms of ATAR aggregate equivalents it’s 77-91-98 (very clustered on that end).

Kudos to you for making an interesting table but it probably doesn’t reflect what you intended it to reflect so it’s probably going to cause more confusion than clarity to prospective students.

Xanthi · Sep 4, 2020

quickoats said:
I see what you tried to do and what data you tried to match up, but this is a misleading resource.

Re: an English advanced entry on your table,
70 raw HSC -> 82 aligned -> 31.9 per unit -> 62 aggregate for the subject is correct, but this 62 does not indicate a 62 ATAR.

the 0-50 mark you see on HSCninja is the aggregate contribution. Since students generally perform on a curve your conversion amplified the differences between subjects dramatically which can skew someone’s interpretation.
e.g. 70-80-90 on your table seem like very considerable jumps but in terms of ATAR aggregate equivalents it’s 77-91-98 (very clustered on that end).

Kudos to you for making an interesting table but it probably doesn’t reflect what you intended it to reflect so it’s probably going to cause more confusion than clarity to prospective students.

The scaled marks wasn't intended to be equal to the ATAR though, but they are directly comparable.

I'll edit the resource to reflect scaled mark -> ATAR equivalent (data again from HSC ninja)

Scaled Mark	ATAR equivalent (single subject
>=94	99.95
93.5	99.90
93	99.85
92	99.80
91	99.65
90	99.45
89	99.20
88	98.90
87	98.50
86	98.05
85	97.60
84	97.05
83	96.50
82	96.10
81	95.60
80	94.95
79	94.30

Trebla · Sep 4, 2020

jasminerulez said:
This might sound dumb but is there a difference between scaled mark and aligned mark?? If not why does a 70 in English get scaled to a 62, like on rawmark.info it showed that a raw mark of 58.5 got scaled to 75 for English Advanced.

I'm sooooo confused rn

Xanthi said:
The website only shows the aligning process.

The actual scaling process occurs to the aligned marks (accurate to 2dp)

Scaling is not applied to the aligned marks. It is applied to the raw marks.

Suggest looking at the flowchart file in the thread below to understand the high level process

Flowchart of how HSC marks and ATAR are determined

Hi all, We have received a lot of questions on the aspects of the HSC and ATAR calculation process ranging from scaling, moderating and aligning. It has become apparent that some confusion exists on how the process works overall. This is important to understand before diving into the finer...

www.boredofstudies.org

Trebla · Sep 5, 2020

Xanthi said:
Here we consider the raw to scaled mark conversion (here I call it scaling for convenience, although not technically correct). All other factors e.g aligning, moderation etc. will not be considered as it is only the raw to scaled mark conversion that gives insight into how hard it is to achieve a certain scaled score. This table should aid in subject selection* and quantifying ATAR goals (use an aggreggate table). You should check against the 2019 papers of each subject for the table below.

The factors that affect scaling include: difficulty of the exam or subject (a more difficult exam will increase raw marks more), to some extent the strength of the cohort in other subjects (a stronger cohort in other subjects generally results in slightly better scaling)**.

Subject Raw mark (on 2019 exam paper) Scaled mark ATAR equivalent Percentage Change between raw and Scaled mark (near 90% scaled)
English Advanced (New) 70 62 79.00
83 80 94.95
87 86 98.05
90 90 99.45 0%
92 93 99.85
Chemistry (New) 82 85 97.60
86 88 98.90
90 92 99.80 +2%
99 98 99.95
Economics 83 82 96.10
86 85 97.60
89 88 98.90 -1%
96 96 99.95
Physics (New) 76 84 97.05
83 89 99.20 +7%
Biology (New) 74 85 97.60
79 90 99.45 +14%
84 92 99.80
Buisness Studies 84 79 94.30
86 81 95.60
90 87 98.50 -3%
Math Advanced 85 83 96.50
96 94 99.95 -2%
Math Extension 1 81 88 98.90
87 90 99.45 +3%
91 92 99.80
Math Extension 2 79 92 99.80 +16%
Software Design & Development 82 79 94.30
94 96 99.95 +2%
Modern History 80 84 97.05
92 92 99.80 0%
Legal Studies 92 86 98.05 -7%
96 95 99.95

From this data, perhaps unexpected trends emerge.

The most evident of which is the surprisingly high scaling of biology, and low scaling of math extension 1.

The scaling of biology is almost as good as math extension 2.

The historical data for math courses are however not as relevant as large changes in this conversion factor could be expected (better if the new content is examined to a more challenging extent). The new science syllabi scaling is likely to be volatile. The scaling of chemistry and physics may improve this year (in line with pre-2019 levels) which would manifest as a more challenging paper. Other subjects should be more or less stable.

*All other factors being equal!

**In an ideal world, cohort strength would not affect scaling, and only the relative difficulty of attaining certain raw marks in different papers would be the determining factor.

Some of the interpretation here is not quite correct.

First of all, it is actually the (relative) cohort strength that drives scaling, NOT the difficulty of the subject. Scaling is based on the student cohort data received for that year. There is no pre-determined rule that says one subject must be scaled higher than another. Difficulty is a subjective measure. For example, what I would perceive to be an easy paper could be perceived by someone else as a difficult paper. There is no way anyone can objectively measure the difficulty of a subject purely on the basis of just looking at a dataset of marks in a bunch of subjects. What can be looked at is how well students in one subject performed in other subjects. This helps measure the relative cohort strength between subjects.

For example, say your reference subject is Chemistry and you look at the subset of the students within the Chemistry cohort who also did Biology. If the average of that subset (i.e. those who did both Chemistry and Biology) is lower than the average of the whole Chemistry cohort, this suggests the Biology cohort is weaker than the Chemistry cohort on average. Therefore, the scaled mean for Chemistry is expected to be higher than the scaled mean for Biology. Of course, the reverse scenario is also possible where Chemistry gets a lower scaled mean than Biology. It all depends on what the dataset of the marks show.

Secondly, when people say one subject "scales" better than another subject that typically refers to comparing the average student in each subject. All this means is that the average student in say Mathematics Extension 2 usually receives a higher scaled mark than the average student in Biology. This shouldn't be surprising as you would expect stronger performing students in the former.

It is not quite correct to compare the the raw marks between Biology and Mathematics Extension 2 because they are not directly comparable. Scaling tries to convert the raw marks with a mathematical model so that the scaled marks are comparable between subjects. Therefore, to compare the scaling of two different subjects you can compare the scaled marks, not the raw marks.

What you are actually doing in your interpretation is comparing the roughly the 99th percentile in Biology to roughly the 70th percentile of Mathematics Extension 2 and concluding that Biology scales similarly to Mathematics Extension 2. It is perhaps more correct to say that a student who gets a raw mark of 84 in Biology (which is near the top 1% of the Biology cohort) will get the same scaled mark as a student who gets a raw mark of 79 (which is around the top 20% of the Mathematics Extension 2 cohort) in Mathematics Extension 2. This simply suggests that "in theory" a student who can get a raw mark of 84 in Biology can also get a raw mark of 79 in Mathematics Extension 2 and therefore gets the same scaled mark of 92 (obviously the reality has so many other factors to take into account like personal strengths etc).

Xanthi · Sep 5, 2020

Trebla said:
Some of the interpretation here is not quite correct.

First of all, it is actually the (relative) cohort strength that drives scaling, NOT the difficulty of the subject. Scaling is based on the student cohort data received for that year. There is no pre-determined rule that says one subject must be scaled higher than another. Difficulty is a subjective measure. For example, what I would perceive to be an easy paper could be perceived by someone else as a difficult paper. There is no way anyone can objectively measure the difficulty of a subject purely on the basis of just looking at a dataset of marks in a bunch of subjects. What can be looked at is how well students in one subject performed in other subjects. This helps measure the relative cohort strength between subjects.

For example, say your reference subject is Chemistry and you look at the subset of the students within the Chemistry cohort who also did Biology. If the average of that subset (i.e. those who did both Chemistry and Biology) is lower than the average of the whole Chemistry cohort, this suggests the Biology cohort is weaker than the Chemistry cohort on average. Therefore, the scaled mean for Chemistry is expected to be higher than the scaled mean for Biology. Of course, the reverse scenario is also possible where Chemistry gets a lower scaled mean than Biology. It all depends on what the dataset of the marks show.

Secondly, when people say one subject "scales" better than another subject that typically refers to comparing the average student in each subject. All this means is that the average student in say Mathematics Extension 2 usually receives a higher scaled mark than the average student in Biology. This shouldn't be surprising as you would expect stronger performing students in the former.

It is not quite correct to compare the the raw marks between Biology and Mathematics Extension 2 because they are not directly comparable. Scaling tries to convert the raw marks with a mathematical model so that the scaled marks are comparable between subjects. Therefore, to compare the scaling of two different subjects you can compare the scaled marks, not the raw marks.

What you are actually doing in your interpretation is comparing the roughly the 99th percentile in Biology to roughly the 70th percentile of Mathematics Extension 2 and concluding that Biology scales similarly to Mathematics Extension 2. It is perhaps more correct to say that a student who gets a raw mark of 84 in Biology (which is near the top 1% of the Biology cohort) will get the same scaled mark as a student who gets a raw mark of 79 (which is around the top 20% of the Mathematics Extension 2 cohort) in Mathematics Extension 2. This simply suggests that "in theory" a student who can get a raw mark of 84 in Biology can also get a raw mark of 79 in Mathematics Extension 2 and therefore gets the same scaled mark of 92 (obviously the reality has so many other factors to take into account like personal strengths etc).

You're right with all the specifics about scaling but in the initial post I defined scaling (incorrectly, but I noted it as such) as the raw to scaled mark conversion, which was what I was referring to in subsequent mentions of the word "scaling" for convenience. These are two completely different things but most people understand scaling (incorrectly) related to the difficulty of the course and the raw to scaled mark conversion percentage rather than the strength of the cohort reflected by the scaled mean which is why I used the term. The raw to scaled mark conversion also seems to be a more useful metric than the scaled mean, which probably contributes the broad misunderstandings.

What you noted in the last paragraph was actually all I intended to do! Basically, a student right now might complete both Math Extension 2 and Chemistry 2019 papers, and be able to convert their raw mark to a scaled mark using the table, then determine which one requires more work. Likewise, a prospective student might evaluate scoring 84 marks on biology as easier than scoring what is probably 100 marks on standard math from the 2019 papers (due to the perfection required for the latter course) or maybe even 79 marks on math extension 2, even though they should technically be the same difficulty, which could guide subject selection choice.

About the difficulty of courses, the raw to scaled mark conversion would be reflective of the difficulty if ability in one course is directly equal to ability in another course i.e a student rank median in english would have median rank in all other subjects if everyone did the course. However, this obviously isn't true which means even this conversion factor is not really a perfect measure of difficulty. However, it is very close measure of estimating what you need to get a certain aggreggate, and revealing which raw marks would be "judged" equal performances under the scaling system.

I think in an ideal world described in the last paragraph, however, the "scaling" (correct definition) and "scaled mean" are only influenced by the relative cohort strength as you note. However, the raw to scaled mark conversion should be influenced only by the difficulty of the exam/course, I think?

Trebla · Sep 5, 2020

Xanthi said:
You're right with all the specifics about scaling but in the initial post I defined scaling (incorrectly, but I noted it as such) as the raw to scaled mark conversion, which was what I was referring to in subsequent mentions of the word "scaling" for convenience. These are two completely different things but most people understand scaling (incorrectly) related to the difficulty of the course and the raw to scaled mark conversion percentage rather than the strength of the cohort reflected by the scaled mean which is why I used the term. The raw to scaled mark conversion also seems to be a more useful metric than the scaled mean, which probably contributes the broad misunderstandings.

Um...I was always referring to scaling as the process to convert raw marks to scaled marks (see the flow chart I linked earlier)?

This raw to scaled mark "conversion" (i.e. when scaling is applied) is not uniform across all the percentiles within a subject (in fact the scaling report specifically refers to a "non-linear" transformation). Scaled marks are allocated according to the shape of the distribution curve of the marks (which is typically specified by the mean and standard deviation). The notion of a "+16%" conversion based on a single mark in Mathematics Extension 2 tells you nothing about how all the other marks are scaled. If anything, making broad sweeping conclusions based on such a tiny cherry picked sample of marks is very misleading.

Although scaled means (and scaled standard deviations) are also not useful in showing how individual marks are scaled, they at least give you some rough idea to describe the cohort as a whole as a measure of centre. This is (slightly) more useful than analysing a rather arbitrary percentile like the 75th percentile.

Xanthi said:
What you noted in the last paragraph was actually all I intended to do! Basically, a student right now might complete both Math Extension 2 and Chemistry 2019 papers, and be able to convert their raw mark to a scaled mark using the table, then determine which one requires more work. Likewise, a prospective student might evaluate scoring 84 marks on biology as easier than scoring what is probably 100 marks on standard math from the 2019 papers (due to the perfection required for the latter course) or maybe even 79 marks on math extension 2, even though they should technically be the same difficulty, which could guide subject selection choice.

This approach to supposedly "exploit" your subject choices to maximise your scaled marks, always leads to the same conclusion. You should always choose subjects that you perform best in, based on your personal strengths. You don't need to understand anything about the technicalities of scaling to reach that same conclusion.

I kind of do agree with the usefulness in the data in terms of "optimising" your efforts on whether improving from 87 to 91 in Chemistry is better than improving from 92 to 95 in Mathematics Extension 2 (according to Table A3 of the 2019 scaling report the former gives an extra +2 scaled marks benefit compared to the latter). However, I should note that unless you reckon you can substantially improve in a subject (i.e. improve by 10+ marks in which case it is a no-brainer to pursue that) the benefit is quite immaterial and within the margins of error of the estimate.

Xanthi said:
About the difficulty of courses, the raw to scaled mark conversion would be reflective of the difficulty if ability in one course is directly equal to ability in another course i.e a student rank median in english would have median rank in all other subjects if everyone did the course. However, this obviously isn't true which means even this conversion factor is not really a perfect measure of difficulty. However, it is very close measure of estimating what you need to get a certain aggreggate, and revealing which raw marks would be "judged" equal performances under the scaling system.

I think in an ideal world described in the last paragraph, however, the "scaling" (correct definition) and "scaled mean" are only influenced by the relative cohort strength as you note. However, the raw to scaled mark conversion should be influenced only by the difficulty of the exam/course, I think?

Unfortunately, the whole notion of scaling being a reflection of the "difficulty" of the course is one of the biggest myths floating around and is a classic example of confusing correlation with causation. Therefore, I think we need to address that premise.

Imagine you are the person performing the scaling algorithm. All you have to work with is a dataset of student marks in different subjects. You can't make any pre-determined judgements based on the name of the subject and your algorithm must only be driven by the data itself. How could you possibly quantify a subject's difficulty relative to another? How do you use the data to objectively reconcile the fact that someone like me finds Mathematics Extension 2 easier than say even English Standard but someone else may perceive the complete opposite? Difficulty is a subjective qualitative concept for a single individual based on their personal strengths/weaknesses. Bottom line is that it is not possible to objectively quantify the difficulty of a subject in this context.

I'll even quote the line the UAC scaling report that they say every single year:
The scaling process is carried out afresh each year. It does not assume that one course is intrinsically more difficult than another or that the quality of the course candidature is always the same.

Basically, we shouldn't even be talking about scaling and subject difficulty in the same sentence.

Going back to my original Chemistry/Biology example. The scaled mean of Chemistry is typically higher than the scaled mean of Biology and this was because in the dataset we tend to find students who also do Biology, within the Chemistry cohort, perform lower on average than the Chemistry cohort as a whole. However, the reverse scenario can also occur where we find students who also do Biology, within the Chemistry cohort, perform higher on average than the Chemistry cohort as a whole. This would lead to the opposite outcome where the scaled mean of Chemistry will be lower than that of Biology. However, the "difficulty" of either course has remained constant, yet drastically different scaling outcomes are possible. It is purely driven by what the data tells us in terms of the relativities in the cohort performance.

At the end of the day, scaling is based purely on relativities. The first aspect being the relative scaled means (and similarly the scaled standard deviation) between subjects as highlighted in my Chemistry/Biology example. Once this is determined, the second aspect is the individual scaled marks which are based on the relative ranks (and the gaps between them) of the individual students within each subject. This is driven by the shape of the mark distribution curve. The results in the second step are dependent on the results in the first step.

Suggest having a read of the UAC scaling report if you are interested in how it all works.

beetree1 · Sep 5, 2020

reading this F#CKED my brain over

Scaling for various subjects (1 Viewer)

Xanthi

Active Member

quickoats

Well-Known Member

jasminerulez

Active Member

Xanthi

Active Member

Xanthi

Active Member

quickoats

Well-Known Member

Xanthi

Active Member

Trebla

Administrator

Flowchart of how HSC marks and ATAR are determined

Trebla

Administrator

Xanthi

Active Member

Trebla

Administrator

beetree1

Well-Known Member

Users Who Are Viewing This Thread (Users: 0, Guests: 1)