wp-plugin-hostgator
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/scienrds/scienceandnerds/wp-includes/functions.php on line 6114ol-scrapes
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home4/scienrds/scienceandnerds/wp-includes/functions.php on line 6114Source:https:\/\/www.quantamagazine.org\/tiny-language-models-thrive-with-gpt-4-as-a-teacher-20231005\/#comments<\/a><\/br> Learning English is no easy task, as countless students well know. But when the student is a computer, one approach works surprisingly well: Simply feed mountains of text from the internet to a giant mathematical model called a neural network. That\u2019s the operating principle behind generative language models like OpenAI\u2019s ChatGPT, whose ability to converse coherently (if not always truthfully) on a wide range of topics has surprised researchers and the public over the past year.<\/p>\n But the approach has its drawbacks. For one thing, the \u201ctraining\u201d procedure required to transmute vast text archives into state-of-the-art language models is costly and time-intensive. For another, even the people who train large language models find it hard to understand their inner workings; that, in turn, makes it hard to predict the many ways they can fail.<\/p>\n Faced with these difficulties, some researchers have opted to train smaller models<\/a> on smaller data sets and then study their behavior. \u201cIt\u2019s like sequencing the Drosophila<\/em> genome versus sequencing the human genome,\u201d said Ellie Pavlick<\/a>, a language model researcher at Brown University.<\/p>\n Now, in a paper<\/a> recently posted to the scientific preprint server arxiv.org, a pair of Microsoft researchers have introduced a new method for training tiny language models: Raise them on a strict diet of children\u2019s stories.<\/p>\n Machine learning researchers have embraced this lesson. GPT-3.5, the large language model that powers the ChatGPT interface, has nearly 200 billion parameters, and it was trained on a data set comprising hundreds of billions of words. (OpenAI hasn\u2019t released the corresponding figures for its successor, GPT-4.) Training such large models typically requires at least 1,000 specialized processors called GPUs running in parallel for weeks at a time. Only a few companies can muster the requisite resources, let alone train and compare different models.<\/p>\n The two researchers showed that language models thousands of times smaller than today\u2019s state-of-the-art systems rapidly learned to tell consistent and grammatical stories when trained in this way. Their results hint at new research directions that might be helpful for training larger models and understanding their behavior.<\/p>\n \u201cI found this paper very informative,\u201d said Chandra Bhagavatula<\/a>, a language model researcher at the Allen Institute for Artificial Intelligence in Seattle. \u201cThe concept itself is super interesting.\u201d<\/p>\n The neural networks at the heart of language models are mathematical structures loosely inspired by the human brain. Each one contains many artificial neurons arranged in layers, with connections between neurons in adjacent layers. The neural network\u2019s behavior is governed by the strength of these connections, called parameters. In a language model, the parameters control which words the model might spit out next, given an initial prompt and the words it has generated already.<\/p>\n A model only truly comes to life during training, when it repeatedly compares its own output to the text in its training data set and adjusts its parameters to increase the resemblance. An untrained network with random parameters is trivially easy to assemble from a few lines of code, but it will just produce gibberish. After training, it can often plausibly continue unfamiliar text. Larger models often undergo further fine-tuning that teaches them to answer questions and follow instructions, but the bulk of the training is mastering word prediction.<\/p>\n Success at word prediction requires a language model to master many different skills. For example, the rules of English grammar suggest that the next word after the word \u201cgoing\u201d is likely to be \u201cto,\u201d regardless of the subject of the text. In addition, a system needs factual knowledge to complete \u201cthe capital of France is,\u201d and completing a passage containing the word \u201cnot\u201d<\/a> requires a rudimentary grasp of logic.<\/p>\n \u201cRaw language is very complicated,\u201d said Timothy Nguyen<\/a>, a machine learning researcher at DeepMind. \u201cIn order for interesting linguistic capabilities to arise, people have resorted to \u2018more data is better.\u2019\u201d<\/p>\n<\/div>\n <\/br><\/br><\/br><\/p>\n
\nTiny Language Models Come of Age<\/br>
\n2023-10-09 21:58:15<\/br><\/p>\nOnce Upon a Time<\/strong><\/h2>\n