ПРО ЕВОЛЮЦІЮ НЕЙРОННИХ МЕРЕЖ

Автор(и)

  • Володимир Куклін Харківський національний економічний університет імені Семена Кузнеця https://orcid.org/0000-0002-0310-1582

DOI:

https://doi.org/10.30890/2709-2313.2025-42-04-001

Ключові слова:

language models, model scaling, computation graphs, numerical methods, continuum approach, modeling of dynamic systems.

Анотація

The review considers a brief history of the creation of modern neural networks - language models and the emergence of networks capable of solving problems and supporting scientific activity. It is shown that the main drivers of the evolution of artificial

Metrics

Metrics Loading ...

Посилання

Kuklin V. M. (2023) On the question of the appearance of consciousness in neural networks // Philosophical peripeteias, Karazin Kharkov University 68, 2023. P. 32-38; DOI: 10.26565/2226-0994-2023-68-3

Gushchin I. V.(2017 Modeling of physical processes using CUDA technology / I.V. Gushchin I. V., Kuklin V. M., Mishin O. V., Priymak O. V. – Kh.: KhNU named after V. N. Karazin, 2017. – 116 p.

OpenAI. GPT (2023)– 4 technical report, 2023. arXiv preprint arXiv:2303.08774 [cs.CL] https://doi.org/10.48550/arXiv.2303.08774.

Linda S. et all. (1997) Mainstream science on intelligence: An editorial with 52 signs, history, and bibliography, 1997.

Bubeckar S. (2023) Sparks of Artificial General Intelligence: Early experiments with GPT –4/ arXiv:2303.12712v2 [cs.CL] 24 Mar 2023.

Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press.

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv:1412.3555.

Abramov GS. Gushchin IV, Sirenka T.O. (2024) On the evolution of recurrent neural systems System research and information technologies. 2024;4;77-85.

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. ArXiv:1409.0473.

Charniak E. Introduction to deep learning (2003,) - The MIT Press Cambridge, Massachuset. Bengio Y., Ducharme R., Vincent P. A Neural Probabilistic Language Model // Journal of Machine Learning Research, 2003, vol. 3. - P. 1137–1155.

Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. International Conference on Machine Learning (pp. 1139–1147).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (pp. 5998–6008).

Gushchyn I.V. Kirichok A.V, Kuklin V.M. (2024) Introduction to Methods of Organization and Optimization of Neural Networks. Kharkiv: V. N. Karazin Kharkiv National University, 2024,229 (in print).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (pp. 5998–6008).

Altarabichi MG. (2024) Dropkan: Regularizing kans by masking post-activations. arXiv preprint arXiv:2407.13044, 2024.

Willison S.( 2024) Things we learned about LLMs in 2024 by Simon Willison, posted on 31st December 2024.

Kaplan J., et all. (2001) Scaling Laws for Neural Language Models, arXiv:2001.08361v1 [cs.LG]. https://doi.org/10.48550/arXiv.2001.08361yu

Amir Gholami A. et all., (2024) AI and Memory Wall / arXiv:2403.14123 [cs.LG]. https://doi.org/10.48550yu

Hooper C. et al. (2025) KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization / arXiv:2401.18079v6 [cs.LG] 28 May 2025.

Valmeekam K., Stechly K, Kambhampati S. (2024) LLMS still can’t plan; can LRMS? A preliminary evaluation of openai’s o1 on PLANBENCH. arXiv:2409.13373v1 [cs.AI] 20 Sep 2024.

Valmeekam K., Stechly K, Gundawar A. Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 arXiv:2410.02162v1 [cs.AI] 3 Oct 2024.

Lightman H. et all. (2023) Let’s Verify Step by Step. arXiv:2305.20050v1 [cs.LG] 31 May 2023.

Narayanan A. and Kapoor S. (2020) Start reading the AI Snake Oil book online. https://www.aisnakeoil.com/p/starting-reading-the-ai-snake-oil.

Hoffman R. (2023) Impromptu: Amplifying Our Humanity Through AI Paperback – March 15, 2023 248p.

Chen R. T. Q., Rubanova Y., Bettencourt J., Duvenaud D. (2019)) Neural Ordinary Differential Equations, arXiv:1806.07366 (2019). doi:10.48550/arXiv.1806.07366

Lev Semenovich Pontryagin, EF Mishchenko, VG Boltyanskii, and RV Gamkrelidze (1962). The mathematical theory of optimal processes. 1962.

LeCun Y, Touresky D, G Hinton G, and T Sejnowski T. (1988) A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, volume 1, pages 21–28. CMU, Pittsburgh, Pa: Morgan Kaufmann, 1988.

Bao-Bing Li, Yi Gu, and Shao-Feng Wu (2024) Discover Physical Concepts and Equations with Machine Learning / arXiv:2412.12161v1 [cs.LG] 11 Dec 2024.

Chen, R., Rubanova Y., Bettencourt J. & Duvenaud, D.( 2018) Neural ordinary differential equations. Advances In Neural Information Processing Systems. 31 (2018); arXiv:1806.07366 (2019). doi:10.48550/arXiv.1806.07366.

Brunton S. L., Proctor J. L., Kutz J. N. (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences 113 (15) (2016) 3932–3937. doi:10.1073/pnas. 1517384113.

Iten, R., Metger, T., Wilming, H., Del Rio, L. & Renner, R. (2020)) Discovering physical concepts with neural networks. Physical Review Letters. 124, 010508 (2020).

Liu, Z., Madhavan, V. & Tegmark, M. (2022) Machine learning conservation laws from differential equations. Physical Review E. DOI: https://doi.org/10.1103/PhysRevE.106.045307

Опубліковано

2025-09-30

Як цитувати

Куклін, В. (2025). ПРО ЕВОЛЮЦІЮ НЕЙРОННИХ МЕРЕЖ. European Science, 4(sge42-04), 8–70. https://doi.org/10.30890/2709-2313.2025-42-04-001

Статті цього автора (авторів), які найбільше читають