AMA with the PyTorch Team

The PyTorch team answers questions from the community. Made by Angelica Pan using Weights & Biases
Angelica Pan


Q for the Pytorch team: Thanks for taking the time to answer questions! 1) Is there a way to print out the model in tabular form like in Keras? 2) When we want to flatten a conv layer before feeding it to a fully connected layer, will we ever have a function called `flatten()` to automatically reshape the layer appropriately for the fully connected layer. Doing the dimension calculations for large models gets tricky
PyTorch Team:
Do you mean to show structure of the model? Regular print() on nn.Modules works
print(torchvision.models.resnet50())ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)...
but it doesn’t show # of parameters and such. I guess it might be a good idea to add it - do you want to open GitHub issues with features request? 2. I assume the problem is passing the right dimensions to Linear layer? We don’t have shape inference for layers today, but it’s indeed a popular request. Related github issue: . There’s a number of higher-level frameworks that add this functionality in special cases (like pytext), maybe weight in on that issue with requests to put it more in the core

Boris Dayma:
Thanks to the Pytorch team for this event! 1/ Do you have any recommendations or new plans to better profile our models? In particular when running into GPU memory issue, it would be great to see what impacts it the most. 2/ Do you have any recommendations or new plans to make deployment easier? I'm thinking in particular on CPU vs GPU, batching queries, distributed models, etc
PyTorch Team:
1/ Yes, profiling is an active area to improve for us. There are small improvements for memory profiling in 1.6 ( and it should get better in future releases. As for common culprits for memory usage: - figuring out the right batch size to limit amount of activations that need to be kept around - sometimes data loading bugs might lead to OOM, e.g. if you have a big queue with input data that is not being processed but take memory (it depends on your input pipeline and hard to comment generically) In some cases gradient checkpointing might be useful to reduce memory usage: - it trades off memory for compute.
2/ It’s a broad topic, let me highlight some things but I’m definitely interested in more details about your use case if you can share. We have several efforts directly or with partners in the ecosystem to make deployment easier: - on model server side, check out TorchServe ( contributed by AWS team (it’s not AWS specific). Nvidia Triton ( is another popular option. They support both python-level deployment of PyTorch models (i.e. just package your script) and TorchScript deployment (for no-python environments). I think there’s some basic support for batching too. - for simple cases, even running python+pytorch+Flask works pretty well ( - if you have your own backend service, you can integrate into it and serve TorchScript models directly in-process, we provide C++ bindings and it’s easy to integrate into pretty much any language ( - for embedded devices there’s PyTorch Mobile ( All of above should work with both CPU and GPU pretty transparently. In the simplest case just move your model to GPU as well as the inputs by calling `.cuda()` (works in python or on TorchScript module in C++). As for distributed serving, there’s nothing ready out of the box, but it’s definitely possible to manually partition the model into pieces (just take individual modules) and use your service infra to orchestrate step by step execution (we do some of that internally at Facebook). In general though, I’m curious where you need distributed service, I only really saw this need come up for large ranking/recommendation models with huge embedding tables.

Really excited to hear from Dmytro! 1. PyTorch has been around for years in a fast-moving research field. What are the biggest changes you've seen during that time and how have they impacted PyTorch and its engineering process/goals? E.g. in the early 2010s, advances like GANs, BatchNorm, and residual connections broke lots of abstractions. 2. PyTorch resolved a clear need for a Pythonic and performant library for accelerable auto-diff. What, if any, holes of a similar magnitude do you see in the Python ecosystem today? E.g. R is still prefered for generalized linear models in statistics/econometrics communities.
PyTorch Team:
1/ That’s a great question! First, PyTorch is only ~4 years old (first public release in January 2017, development about 6 months before that), but a lot of ideas and code got inherited from LuaTorch that dates back from mid-2000s.I think the key strength of PyTorch (and LuaTorch, but Lua has its peculiarities as a programming language) is to treat model as code (i.e. “eager mode”). It’s great for flexibility and modularity. It also means that PyTorch is not really a monolithic framework but rather a set of libraries exposing different levels of API. So if research evolves in a new directions (e.g. residual connections or transformers) it’s usually not a show-stopper for advancement. Individual user is usually able to assemble a working (but maybe not the most performant) implementation of a new idea and if it proves useful, PyTorch (or other libraries) can gradually catch up with better optimizations and techniques.So while the field is changing constantly, we have a pretty good process to listen to community, see what techniques get generally useful and absorb them in the domain libraries (torchvision/torchtext/torchaudio) or directly in PyTorch core as they mature. One important trend is that as hardware gets faster and faster, getting good performance by running individual operators from python gets harder for some models. We’re investing in various compilation techniques to overcome this (e.g. there’s PyTorch JIT fuser that is being further improved, community integrations with TVM stack, etc). As for overall project goals, we started with focusing on research and gradually expanded to also cover aspects of productioning models. It’s a broad field and there’s definitely a lot more for us to do.
2/ I might have a skewed perspective, so more of a personal opinion: data manipulation and preprocessing infrastructure could be way better, especially for bigger datasets (e.g. pandas with better acceleration like CuDF and generally better connection between data infrastructure world like Spark and the world of NNs). It’s hard for me to comment on specific non-NN techniques as I worked less with them personally.

Question for the PyTorch team: How is Phabricator used in the code maintenance and PyTorch development cycle?
PyTorch Team:
Good question, probably prompted by those “If you’re Facebook employee you can see DXXX on Phabricator” github comments 🙂 PyTorch is an open source project and source of truth of the codebase is on github and we do all code reviews and issue tracking there. Internally at Facebook, we actually mirror PyTorch repository on commit-level and use its master version for internal builds (it’s almost the same as using nightlies). We did some further infrastructure integration to run internal CI on per-commit basis to catch more problems early. Facebook internally uses Phabricator for it’s PRs management - that’s why those cryptic comments.

Grant Reaber:
Regarding profiling, I am particularly curious how to understand if code is bottlenecked by such factors as having to move things between the cpu and gpu or kernel launch overhead. In general, we often seem to get low gpu utilization with our models and find that some things, like sampling from mixture distributions, that seem like they should be fast, are actually slow
PyTorch Team:
Profiling is a bit of an art, it’d be great for someone to actually write the best practices (either core devs or someone from the community).General recommendations: - isolate whether data reading is a bottleneck. E.g. measure the speed of data reading only or only of the model part with a single batch / random data - Timeline profiling is very useful. You can use pytorch autograd profiler ( - doesn’t really have much to do with autograd) to get a snapshot of what’s happening and visualize it in Chrome Tracing. You want to see GPU nicely busy with kernels all the time. If they are gaps they might be because of CPU processing. If the kernels are tiny - it’s probably dominated by kernel overhead. If that’s the case, writing a single CUDA kernel might be beneficial (directly or with e.g. numba). There’s some auto fusion in PyTorch itself (if you trace that part of your model) - as mentioned above we’re actively working to make it more powerful (for a bit dated description check out ) We’re also would be very interested in your model code to see how we could extract better perf automatically. If you could share it, it’d be awesome (you can ping me or post on github or forums)

Nick Bardy:
Questions for the Pytorch team: One of the things that I always find fascinating about software projects is how early design decisions affect the shape of the system over time. 1. If you could start Pytorch again from scratch what design decisions would you change from the start and why? 2. What designs decisions did you make early on that helped to shape Pytorch as the successful library it is today? 3. What are the most exciting papers you've ready recently and why do they excite you?
PyTorch Team:
Awesome questions!
1/ PyTorch started as a thin (mostly python) wrapper on TH libraries that date back from 2010s LuaTorch days (and have a lot of ugly C code ). It allowed project to start really quickly, but brought some baggage that takes much more effort to address alter on. One aspect is some annoying API incompatibilities with numpy inherited from those days (dim vs axis, etc). We have some work to address those now: (e.g. we just added complex numbers in 1.6 and fixed division op behavior). But if starting from scratch we’d just go with numpy syntax right away. More internal aspect is reliance on codegen in PyTorch internals to do wrapping of libraries (yes, we have python scripts that generate C++ like this one They are powerful to start with but really hard to maintain. Likewise, there’s ongoing work to gradually reduce / eliminate our reliance on codegen, but it’s a long journey.
2/ I talked a bit about it in another question above, I think the key strength of PyTorch is to treat model as code (i.e. “eager mode”). It’s great for flexibility and modularity. It allows users to bring other libraries from Python ecosystem to fill the gaps. It also means that PyTorch is not really a monolithic framework but rather a set of libraries exposing different levels of API. So if research evolves in a new directions (e.g. residual connections or transformers) it’s usually not a show-stopper for advancement. Individual user is usually able to assemble a working (but maybe not the most performant) implementation of a new idea and if it proves useful, PyTorch (or other libraries) can gradually catch up with better optimizations and standardized APIs. Another important factor (though not really a design decision) is to listen to user feedback continuously and iterate. So please continue providing constructive criticism on forums and github.
3/ It’s a hard one - there’s too much fun stuff going on in AI and AI systems. Personally, I’ve been reading some papers in 3d object detection and tracking, generally related to AV field (my wife and many friends work now for self-driving companies so I’m curious what the state of the art there is ). Compared with 2d computer vision, there’s way less convergence there and many competing different approaches. Also sparse data of point clouds actually requires compute primitives innovation (like sparse conv, voxelization, etc). More related to systems side, there is interesting stuff on distributed training, e.g. DeepSpeed/ZeRO.

Saurabh Kataria:
One question for PyTorch team: Do you think it can be a good design philosophy (for future pytorch versions) that code gives scientific suggestions to user? Since machine learning is a fast evolving field and keeping track of recent (or even relevant) research is hard. For example, code automatically detects that one block in pipeline is slow; it can recommend to try "data echoing/caching" approach proposed recently by Google, which addresses the issue. I guess I am wondering about intelligent feedback coming from pytorch code.
PyTorch Team:
Yes, I think “user-friendly top-level suggestions” is definitely very cool idea. It’s quite hard to execute and get good precision/recall. I agree with you it’s more promising on system performance side, e.g. incorporate such suggestions into profilers / perf tools. Doing so for ML algorithms is quite hard generically because best practices vary a lot between domains, I’d rather imagine it appearing in specific model implementations (e.g. HF transformers). On perf side, frankly we need to do much better tools and we started doing work in this area this year (see updates to autograd profiler, memory profiler). Once getting the basics right, exploring higher-level suggestions is a next step. It’d be interesting to figure out how community can easily write “recipies” that take profiling info and generate warnings/suggestions. As for whether it’s possible in general, I’ve seen tools like that pretty useful for traditional services engineering at Facebook - i.e. pulling few dozens different stats about the service and try to warn what doesn’t look right.

Brian W:
Are there any serious plans to construct complex number based tensors, with autodifferentiation, that are consistent with complex analysis, even if only Wirtinger calculus initially? For me, we are applying deep learning to signal processing and physics informed NN’s and the lack of good tools in this area is restrictive. The common work-around is to use two matrices. I think there are huge area’s of research that PyTorch could accelerate if this existed.
PyTorch Team:
You’re in luck, we just released complex tensors support in PyTorch core in 1.6 (and before it was available as 3rd party package) and it works with autograd too (see ). I’m personally a bit rusty with complex calculus, you can check the docs or ask on Forums, Alban Desmaison is one of the key developers for this.

Daniel Cooper:
Question for the PyTorch team: - Do you have any general recommendations for strategies to speed up population based training in the context of 1-4 GPUs on a single machine? I was really impressed with the POET algorithm from Uber, and I think the idea of learning a curriculum / co-evolving challenges and solutions is really powerful... but I wonder if there is a way to maintain a population of agents (NNets) that you can train with gradient descent without requiring massive amount of GPUs. I thought about either having multiple (small) models on the same GPU, trying to more efficiently load/unload weights to/from the GPU, some sort of weight sharing, etc. Long shot, but I thought I'd ask if you had any recommendations
PyTorch Team:
Sorry, I didn’t really work with such approaches much. The ideas you mentioned sound plausible. Weight sharing and copying the models in stages to gpu (you could just call .cpu() or .cuda()) might be options. If you have a lot of host CPU, you could try to use CUDA unified memory to let cuda manage the cpu/gpu offload (something like ) - I’d need to actually look how to hook it up with PyTorch.

Han Lee:
1. How does the PyTorch team determine its development direction and the balance between key core libraries vs ecosystem? 2. What’s in the bag for code/performance profiling and serving tools? Also, what about data fairness profiling tools, perhaps not from PyTorch, but within FAIR? 3. Microsoft is the maintainer of PyTorch for Windows now, how would that impact the release schedules or potential offsets between Windows vs Linux? 4. Apple is switching Mac to ARM and hopefully with their own tensor cores in the CPU. And there’s always AMD. How much efforts are going in to support now even wider array of hardware systems?
PyTorch Team:
1/ Listening to community a lot through github/forums/conferences/personal contacts. For PyTorch core, we’re trying to play on existing strength (accelerated tensor library, NNs, autodiff) and support ecosystem projects for complementary areas (e.g. TorchServe, Mlflow, Lightning, OpenMined).
2/ On performance/service - see above, I’ve put some details in other answers On fairness, it’s an active area of research, but I don’t know details much on FAIR side. There’s something called ‘What If’ tool in Captum / Tensorboard, I personally don’t know details..
3/ No change to releases, it’s the same codebase and same release process, just now with Microsoft help to support Windows specific functionalities and maintains CI green. There are some existing gaps with Windows today (e.g. distributed), Microsoft is working with us to close them. Btw, it’s super exciting to have them onboard and make Windows support awesome.
4/ We have to balance and prioritize (there are 40+ ML accelerator startups out there!) but try to support community. Re AMD, we’re collaborating with them for a while and there’s AMD support for more than a year on master, it’s part of our CI. There are still some stability gaps, so we haven’t put it part of releases yet, but you can try it from AMD website ( AMD folks are active on PyTorch Github and will help you out with questions/issues. On ARM switch for Macs - we actually have some support of ARM cpus because of PyTorch Mobile already. For special hardware features it depends on what they are and what is the demand of users wanting to train on Macs. So I’d say let’s wait and see.

Daniel Cooper:
Question for the PyTorch team: Any predictions on the future of deep learning libraries / programming languages? PyTorch seems like the clear winner, but just thinking about languages like Julia / Swift (and S4TF) and libraries like Haste, Jax, etc.
PyTorch Team:
Re other programming languages for ML is big change wit ha lot of tradeoffs and extremely hard to predict where the center of gravity will shift. I think Julia and Swift are better languages than Python and the e2e development approach is super powerful - one can write top-level APIs and CUDA kernels in the same language. But there’s one cons - it’s not Python. Python language itself is not that great for ML actually, it’s slow, but Python ecosystem is amazing. (Heck, even Lua is a better language by itself in terms of perf, but lacks any libraries). I’d say in 2-3 years timeframe not much will change re Python, maybe in the longer frame it will, but hard to predict. JAX is very cool, those folks are doing really innovative stuff in ML systems space. It really shows power of program transformations on a simplified program semantics and how to leverage compilers to create good UX. We’re experimenting with some ideas on how to bring these transforms to PyTorch too, though it’s harder with as PyTorch tensor semantics are broader (e.g. support for views and mutability). Overall, I think compilation is very important for future progress, we’re investing into it too and it’s exciting to see projects like TVM/Halide.

Cuong Barry:
Hi Pytorch team, many thanks for your excellent DL framework, I switch from Tensorflow, Keras to Pytorch in the latest year. Lets me have some questions: 1/ Deep Learning algorithms usually requires a complex architecture with million of parameters, leading to heavy models which requires a good resource for deployment and prediction. Is there any way to enable model pruning, distillation, or somehow to make the trained model smaller but still guarantee a comparable performance after training completeness? 2/ What is the core improvement of Pytorch in comparison with other existing frameworks as Tensorflow, Keras, …
PyTorch Team:
1. Techniques you mentioned all make sense. The catch is that they all navigate tradeoff between accuracy and system performance, so the optimal point is going to be different for different models and use cases. I’d say quantization approaches are more mature and generally applicable out of those. Overall, I like the name “system-model co-design” for such approaches. They requires a lot of trial and error just like ML research. What can be better is automation to explore different techniques. It’s probably sits on the level above PyTorch, there are some more narrow projects doing it (liek architecture search), but it’s hard to have something comprehensive so far.
2. Flexibility and principle of “just write code” (eager mode) - I talked about it above. Also the community is amazing (thank you!) and use in research means that you will find latest and greatest models in PyTorch. Actually, PyTorch design influenced TensorFlow 2.0 quite a bit, hehe

Piyush Agarwal:
Q for the Pytorch team: 1. Is there a plan to make deploying model on edge devices easier using Pytorch? Thanks to W&B team for hosting this AMA
PyTorch Team:
Have you tried PyTorch Mobile ( It’s a bit early, but give it a try and give us feedback. Very low-power devices (like microcontrollers) are a bit out of scope (as it requires often very hardware special tuning, the approach we take with PyTorch mobile is to minimize process of conversion, “it’s the same pytorch”). Maybe you could export your model to ONNX and use some specialized runtimes for your platform (depends on what it is).

Koustuv Sinha:
Question for Pytorch team: Is there any roadmap to integrate graph neural models in `torch.nn`, following the success of Pytorch Geometric and DGL? Thanks for this AMA!
PyTorch Team:
Adding on to this question -- any plans on adding the ability to trace models for Pytorch geometric? ( No concrete roadmap. General approach we take is to empower individual subcommunities to innovate quickly and slowly graduate broadly applicable ideas to the domain libraries (like torchvision) or core. I’m personally following details around Geometric / DGL less so can’t comment on details. You can maybe create github/forums question and I can tag folks who know more. Re tracing/scripting of pytorch_geometric, I think there’s actually ongoing work to make it TorchScript-compatible as I remember some folks from our team talking to the Geometric authors. I can try to find pointers for you later.

Brian W:
Is there anything particularly exciting to you on the horizon with PyTorch and Nvidia Rapids?
PyTorch Team:
The power of modular libraries is that you can use them together (e.g. No concrete plans beyond that at this point. There might be some polishes to do e2e structured data usages with NNs easier. We’re generally interested in growing this area, but not concrete investment so far. Can I ask in reverse - what type of use cases motivated your question?

Stacey Svetlichnaya:
Thanks so much for this AMA! The more I explore PyTorch, the more I love it. Which upcoming features or new directions/use cases are you most excited about for PyTorch? Also, beyond PyTorch, I'm curious what you find most inspiring/promising in the broader ecosystem of deep learning tools (e.g. for debugging, visualization, inference/deployment, etc).
PyTorch Team:
Oh, there’s a lot to list here, just a random sampling from the direction: - better performance tooling, debuggers, etc - performance improvements through compiler approaches, e.g. fusion of a lot of small operators - numpy compatibility - further improvements to autograd, e.g. second order derivatives - continuing to improve “research to production”, deployment tools, model co-design, etc. On ecosystem side there’s even more stuff to list. One pick would end-to-end ML pipeline integrations (more from production side) or experimentation managements, stuff like Weights&Biases or MLFlow. I think this part of practical ML development lacks today in OSS ecosystem in general.

Question for Pytorch team: Does the Pytorch team have something in the line of knowledge distillation and model compression with deep learning in the roadmap? This could be something like a metric as to how difficult/easy is the particular input for the task(a metric maybe?), or something in the line of giving more insights on what should be the appropriate depth/architecture of the student network in a teacher-student network scenario, using the gradient flow information while training the teacher network
PyTorch Team:
We don’t have concrete plans on the core side, but there are many projects in the ecosystem. There are many approaches and I don’t think a single one emerged yet that would be more universally well-applicable.

@pytorch team : Tensorboard gives brilliant visualisation and is useful for debugging and comparison. Any plans to enhance Visdom as it will be very useful and efficient for us.
PyTorch Team:
We support Tensorboard integration in PyTorch core ( and Google folks did indeed great work on it, Visdom is also a cool project though it’s not directly supported by our immediate team. Any particular enhancements you have in mind? Probably best to create Github issues for it.

Any plans on enhancing Gpytorch and giving more examples and use cases with respect to meta learning aspects with it will be very helpful.
PyTorch Team:
Gpytorch is not part of PyTorch itself actually, but a very important part of the ecosystem (and some of the contributors are from Facebook, though the core is from Cornell). I’d suggest posting concrete requests on respective Githubs.

Last one is are there something like torch.nn.DataParalled for using multiple TPUs. This is something i need help on.
PyTorch Team:
I’m not that hands on with PyTorch-XLA integration, but you can ask on forums or github - folks from Google who support it are amazing and super-responsive! I think XLA actually does data parallelism on its own under the hood, so you can’t really use torch.nn.DataParallel directly because python process doesn’t have access at that low level (entire graph of one epoch gets hand over to XLA and it does its magic for TPUs). It’s less flexible to the user, but that’s the only way TPUs are exposed to the world. is likely the right link.

Sumukh Aithal:
Do you have a timeline or any plans of integrating PyTorch_xla into Core PyTorch (similar to amp) or providing direct support to TPUs?
PyTorch Team:
Not at the moment. The team supporting PyTorch XLA is awesome and supports the project really well. We also have XLA builds in PyTorch CI, so we do make sure stuff is not broken even on nightlies.

Yong Zheng Xin:
Questions for PyTorch: 1. What is the product milestones for PyTorch by the end of this year? 2. Does PyTorch intend to incorporate methods to visualize what the each of the `nn.transformer` layer learns (such as self-attention, etc.)?
PyTorch Team:
1/ I had some details above: Oh, there’s a lot to list here, just a random sampling from the direction: - better performance tooling, debuggers, etc - performance improvements through compiler approaches, e.g. fusion of a lot of small operators - numpy compatibility - further improvements to autograd, e.g. second order derivatives - continuing to improve “research to production”, deployment tools, model co-design, etc.
2/ Not in the core itself, but there’s a lot of visualization/model understanding work going on around Tensorboard or PyTorch Captum ( I’m not sure specifically re transformer visualization though.

Ayush Thakur:
1.What's the use of `nn.ModuleList`? And how to correctly use it? Sorry if I am being naive here but I am unable to grasp the motivation behind `nn.ModuleList.` 2. What's the best way to use protobuf file with PyTorch? Or what's the best way to use Tfrecord with PyTorch?
PyTorch Team:
1/ See the docs There are two reasons: - nn.Module “knows” about ModuleList, so calling `children` on the parent module will show them, it won’t happen with a regular python list (you’d need to `register_module` each element manually) - TorchScript knows about ModuleList and can understand iteration over it in scripted method. It’s not allowed in TorchScript for regular python lists to contain Modules because we have to statically understand your module structure at the level of modules.
2/ You can just use protobuf libraries to decode it directly. If protobuf-python proves too slow, writing a simple C++ op ( that uses C++ protobuf APIs to decode and construct Tensors is probably a way to go. We don’t have anything ready for tfrecord specifically, but it won’t be hard to roll something together as I just described. You could still put your custom op in DataLoader and use the rest of infrastructure - Reversing the question - can you describe more details about your use case? Why you use tfrecord and what structure of data do you store? (just curious)

Anil Odzemir:
Question for PyTorch team: Are you planning to "support" reservoir computing? As in, creating layers/modules. Currently, there are many implementations, it would be nice if there is a unified framework.
PyTorch Team:
Our general approach for pulling something into core is to see first that community is converging on the same techniques. E.g. we pulled multihead attention / transformer building blocks into core recently based on this reasoning. Reservoir sampling probably actually fits this bar too. Do you want to create github issue with feature proposal? Or if you want - maybe even contribute implementation.

Rajesh Shreedhar Bhat:
Question for the pytroch team : any plans of having .fit or .fit_generator functionalities similar to Keras/TF 2.0 in pytorch ?
PyTorch Team:
It’s a common debate topic - should PyTorch itself have a “blessed” training loop. It’s actually a hard tradeoff - it’s hard to abstract one API to satisfy all use cases and frequently attempts to do so fail (Keras’s fit is also a bit restrictive). That’s why we were holding off blessing one training loop until now. Specific frameworks (like HF transformers) usually provide one with best targeting to the use case. There are libraries to make it easier in ecosystem: - PyTorch Lightning ( is really cool, develops quickly, has lively community and generally gets good feedbacks. It’s probably your best approximation of .fit() - PyTorch Ignite ( is very popular too but has a bit different API paradigm around event loops - library is also great but more opinionated

Govind K:
Question for the PyTorch team: Is it possible to dynamically create the init, forward functions on the go having just given a dataset? In other words, I am planning to have a set of hyperparameters of a 1D CNN given as argument along with a dataset and I will mention how many CNN layers, Fully connecting layers I will need and I was hoping it can be created without me writing the sequence of layers. Any thoughts on this.So far I can build a complex model without manually changing the number of neurons in the fully connected layers by using the Conv1D formula and I was wondering if I can make it more flexible as my question above. EDIT: I will have one conv1D, FC layer written up so that these can be used to build the model architecture given a schema that I will create.
PyTorch Team:
For just assembling the model, torch.nn.Sequental gives you the idea, and you can build arbitrary fancy “mini framework” along these lines (like even have some lists/dicts of layers and orchestrate it in forward()) The part with shape inference for nn.Linear is indeed an annoying one. Today, you’d probably need to do it manually. E.g. when adding Linear layer to the list you could run the previous layers on fake data (torch.rand()) and see what shape comes out in order to construct the next layer (you can still run modules in init()). We probably should revisit whether adding this to the core will work out well for majority layers.

For more AMAs with industry leaders, join the W&B Slack community.