Class MlBreakEngine
java.lang.Object
com.ibm.icu.impl.breakiter.MlBreakEngine
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate UnicodeSetprivate UnicodeSetprivate intprivate static final int -
Constructor Summary
ConstructorsConstructorDescriptionMlBreakEngine(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet) Constructor for Chinese and Japanese phrase breaking. -
Method Summary
Modifier and TypeMethodDescriptionintdivideUpRange(CharacterIterator inText, int startPos, int endPos, CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks) Divide up a range of characters handled by this break engine.private voidevaluateBreakpoint(String inputStr, int[] indexList, int startIdx, int numCodeUnits, ArrayList<Integer> boundary) Evaluate whether the breakpointIdx is a potential breakpoint.private intinitIndexList(CharacterIterator inString, int[] indexList, int codePointLength) Initialize the index list from the input string.private voidinitKeyValue(UResourceBundle rb, String keyName, String valueName, HashMap<String, Integer> map) In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.private voidLoad the machine learning's model file.private Stringtransform(CharacterIterator inString) Transform a CharacterIterator into a String.
-
Field Details
-
MAX_FEATURE
private static final int MAX_FEATURE- See Also:
-
fDigitOrOpenPunctuationOrAlphabetSet
-
fClosePunctuationSet
-
fModel
-
fNegativeSum
private int fNegativeSum
-
-
Constructor Details
-
MlBreakEngine
public MlBreakEngine(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet) Constructor for Chinese and Japanese phrase breaking.- Parameters:
digitOrOpenPunctuationOrAlphabetSet- An unicode set with the digit and open punctuation and alphabet.closePunctuationSet- An unicode set with the close punctuation.
-
-
Method Details
-
divideUpRange
public int divideUpRange(CharacterIterator inText, int startPos, int endPos, CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks) Divide up a range of characters handled by this break engine.- Parameters:
inText- An input text.startPos- The start index of the input text.endPos- The end index of the input text.inString- A input string normalized from inText from startPos to endPoscodePointLength- The number of code points of inStringcharPositions- A map that transforms inString's code point index to code unit index.foundBreaks- A list to store the breakpoint.- Returns:
- The number of breakpoints
-
transform
Transform a CharacterIterator into a String. -
evaluateBreakpoint
private void evaluateBreakpoint(String inputStr, int[] indexList, int startIdx, int numCodeUnits, ArrayList<Integer> boundary) Evaluate whether the breakpointIdx is a potential breakpoint.- Parameters:
inputStr- An input string to be segmented.indexList- A code unit index list of the inputStr.startIdx- The start index of the indexList.numCodeUnits- The current code unit boundary of the indexList.boundary- A list including the index of the breakpoint.
-
initIndexList
Initialize the index list from the input string.- Parameters:
inString- An input string to be segmented.indexList- A code unit index list of the inString.codePointLength- The number of code points of the input string- Returns:
- The number of the code units of the first six characters in inString.
-
loadMLModel
private void loadMLModel()Load the machine learning's model file. -
initKeyValue
private void initKeyValue(UResourceBundle rb, String keyName, String valueName, HashMap<String, Integer> map) In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.- Parameters:
rb- A RedouceBundle corresponding to the model file.keyName- The kay name in the model file.valueName- The value name in the model file.map- A HashMap to store the pairs of the feature and its score.
-