Separated fit and transform steps. #4

iacobo · 2019-08-02T09:37:28Z

If one sequence has e.g. only CGT the old code will inconsistently assign integers to each letter if other sequences have a different set of letters (i.e. ACG or ACGT). Not problematic for long sequences which are very unlikely to not use all 4 letters, but if e.g. applying to small k-mers very likely to produce errors.

i.e. integer_encoder.fit_transform(list('CGT')) != integer_encoder.fit_transform(list('ACGT'))

Solution: grab alphabet of all letters in all sequences first and fit before transforming each sequence in loop.

If one sequence has e.g. only CGT the old code will inconsistently assign integers to each letter if other sequences have all letters. i.e. integer_encoder.fit_transform(list('CGT')) != integer_encoder.fit_transform(list('ACGT')) Solution: grab alphabet of all letters in all sequences first and fit before transforming each sequence in loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separated fit and transform steps. #4

Separated fit and transform steps. #4

Uh oh!

iacobo commented Aug 2, 2019

Uh oh!

Uh oh!

Separated fit and transform steps. #4

Are you sure you want to change the base?

Separated fit and transform steps. #4

Uh oh!

Conversation

iacobo commented Aug 2, 2019

Uh oh!

Uh oh!